The online Cramér-von Mises detector is a non-parametric method for online drift detection on continuous data. Like the offline Cramér-von Mises detector, it applies a univariate Cramér-von Mises (CVM) test to each feature. This detector is an adaptation of that proposed in this paper by Ross et al. .
Warning
This detector is multi-threaded, with Numba used to parallelise over the simulated streams. There is a known issue on MacOS, where Numba's default OpenMP threading layer causes segfaults. A workaround is to use the slightly less performant workqueue
threading layer on MacOS by setting the NUMBA_THREADING_LAYER
enviroment variable or running:
Online detectors assume the reference data is large and fixed and operate on single data points at a time (rather than batches). These data points are passed into the test-windows, and a two-sample test-statistic between the reference data and test-window is computed at each time-step. When the test-statistic exceeds a preconfigured threshold, drift is detected. Configuration of the thresholds requires specification of the expected run-time (ERT) which specifies how many time-steps that the detector, on average, should run for in the absence of drift before making a false detection. Thresholds are then configured to target this ERT by simulating n_bootstraps
number of streams of length t_max = 2*max(window_sizes) - 1
. Conveniently, the non-parametric nature of the detector means that thresholds depend only on $M$, the length of the reference data set. Therefore, for multivariate data, configuration is only as costly as the univariate case.
Note
In order to reduce the memory requirements of the threshold configuration process, streams are simulated in batches of size $N_{batch}$, set with the batch_size
keyword argument. However, the memory requirements still scale with $O(M^2N_{batch})$. If configuration is requiring too much memory (or time), then consider subsampling the reference data. The quadratic growth of the cost with respect to the number of reference instances $M$, combined with the diminishing increase in test power, often makes this a worthwhile tradeoff.
Specification of test-window sizes (the detector accepts multiple windows of different size $W$) is also required, with smaller windows allowing faster response to severe drift and larger windows allowing more power to detect slight drift. Since this detector requires the windows to be full to function, the ERT is measured from t = min(window_sizes)-1
.
Although this detector is primarly intended for univariate data, it can also be applied to multivariate data. In this case, the detector makes a correction similar to the Bonferroni correction used for the offline detector. Given $d$ features, the detector configures thresholds by targeting the $1-\beta$ quantile of test statistics over the simulated streams, where $\beta = 1 - (1-(1/ERT))^{(1/d)}$. For the univariate case, this simplifies to $\beta = 1/ERT$. At prediction time, drift is flagged if the test statistic of any feature stream exceed the thresholds.
Note
In the multivariate case, for the ERT's upper bound to be accurate, the feature streams must be independent. Regardless of independence, the ERT will still be properly lower bounded.
Arguments:
x_ref
: Data used as reference distribution.
ert
: The expected run-time in the absence of drift, starting from t=min(windows_sizes).
window_sizes
: The sizes of the sliding test-windows used to compute the test-statistics. Smaller windows focus on responding quickly to severe drift, larger windows focus on ability to detect slight drift.
Keyword arguments:
preprocess_fn
: Function to preprocess the data before computing the data drift metrics.
n_bootstraps
: The number of bootstrap simulations used to configure the thresholds. The larger this is the more accurately the desired ERT will be targeted. Should ideally be at least an order of magnitude larger than the ERT.
batch_size
: The maximum number of bootstrap simulations to compute in each batch when configuring thresholds. A smaller batch size reduces memory requirements, but can result in a longer configuration run time.
n_features
: Number of features used in the FET test. No need to pass it if no preprocessing takes place. In case of a preprocessing step, this can also be inferred automatically but could be more expensive to compute.
verbose
: Whether or not to print progress during configuration.
input_shape
: Shape of input data.
data_type
: Optionally specify the data type (tabular, image or time-series). Added to metadata.
Initialized drift detector example:
We detect data drift by sequentially calling predict
on single instances x_t
(no batch dimension) as they each arrive. We can return the test-statistic and the threshold by setting return_test_stat
to True.
The prediction takes the form of a dictionary with meta
and data
keys. meta
contains the detector's metadata while data
is also a dictionary which contains the actual predictions stored in the following keys:
is_drift
: 1 if any of the test-windows have drifted from the reference data and 0 otherwise.
time
: The number of observations that have been so far passed to the detector as test instances.
ert
: The expected run-time the detector was configured to run at in the absence of drift.
test_stat
: CVM test-statistics between the reference data and the test_windows if return_test_stat
equals True.
threshold
: The values the test-statsitics are required to exceed for drift to be detected if return_test_stat
equals True.
The detector's state may be saved with the save_state
method:
The previously saved state may then be loaded via the load_state
method:
At any point, the state may be reset to t=0
with the reset_state
method. When saving the detector with save_detector
, the state will be saved, unless t=0
(see here).