Getting Started • jaspPredictiveAnalytics

This document provides a quick high-level overview of the functionality that the Predictive Analytics module offers - and concepts that are important in connection to it. If you want to learn how to use the module click here for tutorials and view this page for a more in-depth explanation and background of the methods used. Currently, two analysis are available: The Univariate Predictive Analytics analysis provides univariate time series predictions to monitor whether a single process will go out of control. On the other hand, the Multivariate Binomial Control analysis models the out-of-control proportion of a single/multiple processes and predicts how this proportion will evolve in the future. Thus it is more suitable to monitor multiple variables at the same time.

Process Control

In process control, we usually have a process that continuously generates data and we want to monitor it over time to make sure that it stays within the desired boundaries over time. For example, in a health care setting we might be interested in monitoring the absenteeism rate of employees in order to detect a potential outbreak of a diseases (Woodall & Montgomery, 2014). In an industry context, we might have a continuous production process and are interested in monitoring the dimension of a produced product to ensure that it is reliable enough to be sold.

The Univariate Predictive Analytics analysis offers several options to determine the control limits that determine whether a process is out-of-bound or not. The control limits can either be set manually or based on the data by selecting the number of standard deviations from the mean beyond which data is considered to be out of control. Alternatively, the control limits can be directly computed from (a subset) of the dataset that is being investigated. Additionally, it includes a basic quality control chart, process control summary statistics and outlier tables.

The Multivariate Binomial Control currently allows to manually set the control bounds for several variables at the same time. It includes control charts for both the exact amount of out-of-control data points as well as the overall proportion over time.

Probabilistic Time Series Prediction

When we are not only interested in monitoring whether a process is currently out-of-control - bur rather whether if it is going to be out of control in the future. For this we can make use of time series prediction. This is especially valuable in a production process with multiple production steps: Detecting that the system produces faulty parts only at the end of the production cycle increases the loss as all parts in the current cycle might need to be discarded. Thus being able to predict whether a process is going to go out of control can significantly save cost and time as it enables as to act in advance.

Thus the Univariate Predictive Analytics analysis includes several time series prediction models that are trained on historical data to make predictions about the future. The available models range classical time series models (e.g. state space models, prophet) to more flexible machine learning models (Bayesian additive regression trees). Additionally, we put an emphasis on probabilistic forecasting method (Gneiting & Katzfuss, 2014) as this allows us to quantify the uncertainty we have in our estimates - and not only predict if and when a process will go out of control but also with what probability.

The Multivariate Binomial Control on the other hand allows us to predict how the overall proportion of out-of-control variables will evolve over time - and whether it reaches a certain threshold.

Forecast Evaluation

In order to make decisions based on our predictions we need to have confidence in its accuracy. One common way to evaluate the forecast accuracy is to repeatedly subset available historical data into a training and test data set. The training set is used to train the statistical model which then make predictions about the test set that are compared to the real observation. We can compute various forecasting metrics that quantify the accuracy of each model and help with determining the best model.

Currently only the Univariate Predictive Analytics analysis offers forecasting evaluation. The accuracy is assessed by forecast metrics that are based on point predictions (such as R-squared or mean absolute error) as well as a probabilistic metrics that take the account the whole predictive distribution (e.g. Logarithmic score or the Continuous Ranked Probability Score).

Ensemble Bayesian Model Averaging

Instead of relying on the single best model to predict the future, we can also combine the predictions of different models. First, this has the benefit that it takes into account the uncertainty we have in our model selection as simply selecting a single best model might produce overconfident results (Wagenmakers et al., 2022). Additionally, the problem at hand might be too complex to be represented well by a single model - combining different models might provide a better predictuon if the if each model capture a unique aspect of the problem at hand (Sagi & Rokach, 2018). As a last point, it reduces the chances of selecting the wrong model by the user as they don’t have to manually pick one.

The Univariate Predictive Analytics analysis includes ensemble Bayesian model averaging as a method to combine the predictions of different models(Raftery et al., 2005). Each model is weighted according to their historical predictive accuracy (see previous section) and then these predictions are combined to predict the future. Each model weight can be interpreted as the relative probability of it being the true model.

Reporting Mode

As a novelty, this module also adds the new Reporting Mode functionality that was recently added to JASP. That way operators and users can be informed when a process reaches a threshold or is predicted to do so in the future. Especially as JASP can now also connect to the most frequently used databases, it can integrate into live production processes and provide near real time feedback!

Gneiting, T., & Katzfuss, M. (2014). Probabilistic Forecasting. Annual Review of Statistics and Its Application, 1(1), 125–151. https://doi.org/10.1146/annurev-statistics-062713-085831

Raftery, A. E., Gneiting, T., Balabdaoui, F., & Polakowski, M. (2005). Using Bayesian Model Averaging to Calibrate Forecast Ensembles. Monthly Weather Review, 133(5), 1155–1174. https://doi.org/10.1175/MWR2906.1

Sagi, O., & Rokach, L. (2018). Ensemble learning: A survey. WIREs Data Mining and Knowledge Discovery, 8(4). https://doi.org/10.1002/widm.1249

Wagenmakers, E.-J., Sarafoglou, A., & Aczel, B. (2022). One statistical analysis must not rule them all. Nature, 605(7910), 423–425. https://doi.org/10.1038/d41586-022-01332-8

Woodall, W. H., & Montgomery, D. C. (2014). Some Current Directions in the Theory and Application of Statistical Process Monitoring. Journal of Quality Technology, 46(1), 78–94. https://doi.org/10.1080/00224065.2014.11917955