1 of 5

Online

Online Maximum Mean Discrepancy

Overview

The online detector is a kernel-based method for online drift detection. The MMD is a distance-based measure between 2 distributions p and q based on the mean embeddings $\mu_{p}$ and $\mu_{q}$ in a reproducing kernel Hilbert space $F$:

Given reference samples ${X_i}{i=1}^{N}$ and test samples ${Y_i}{i=t}^{t+W}$ we may compute an unbiased estimate $\widehat{MMD}^2(F, {X_i}{i=1}^N, {Y_i}{i=t}^{t+W})$ of the squared MMD between the two underlying distributions. The estimate can be updated at low-cost as new data points enter into the test-window. We use by default a , but users are free to pass their own kernel of preference to the detector.

Online detectors assume the reference data is large and fixed and operate on single data points at a time (rather than batches). These data points are passed into the test-window and a two-sample test-statistic (in this case squared MMD) between the reference data and test-window is computed at each time-step. When the test-statistic exceeds a preconfigured threshold, drift is detected. Configuration of the thresholds requires specification of the expected run-time (ERT) which specifies how many time-steps that the detector, on average, should run for in the absence of drift before making a false detection. It also requires specification of a test-window size, with smaller windows allowing faster response to severe drift and larger windows allowing more power to detect slight drift.

For high-dimensional data, we typically want to reduce the dimensionality before passing it to the detector. Following suggestions in , we incorporate Untrained AutoEncoders (UAE) and black-box shift detection using the classifier's softmax outputs () as out-of-the box preprocessing methods and note that can also be easily implemented using scikit-learn. Preprocessing methods which do not rely on the classifier will usually pick up drift in the input data, while BBSDs focuses on label shift.

Detecting input data drift (covariate shift) $\Delta p(x)$ for text data requires a custom preprocessing step. We can pick up changes in the semantics of the input by extracting (contextual) embeddings and detect drift on those. Strictly speaking we are not detecting $\Delta p(x)$ anymore since the whole training procedure (objective function, training data etc) for the (pre)trained embeddings has an impact on the embeddings we extract. The library contains functionality to leverage pre-trained embeddings from but also allows you to easily use your own embeddings of choice. Both options are illustrated with examples in the notebook.

Usage

Initialize

Arguments:

x_ref: Data used as reference distribution.
ert: The expected run-time in the absence of drift, starting from t=0.
window_size: The size of the sliding test-window used to compute the test-statistic. Smaller windows focus on responding quickly to severe drift, larger windows focus on ability to detect slight drift.

Keyword arguments:

backend: Backend used for the MMD implementation and configuration.
preprocess_fn: Function to preprocess the data before computing the data drift metrics.
kernel: Kernel used for the MMD computation, defaults to Gaussian RBF kernel.
sigma: Optionally set the GaussianRBF kernel bandwidth. Can also pass multiple bandwidth values as an array. The kernel evaluation is then averaged over those bandwidths. If sigma is not specified, the 'median heuristic' is adopted whereby sigma is set as the median pairwise distance between reference samples.
n_bootstraps: The number of bootstrap simulations used to configure the thresholds. The larger this is the more accurately the desired ERT will be targeted. Should ideally be at least an order of magnitude larger than the ERT.
verbose: Whether or not to print progress during configuration.
input_shape: Shape of input data.
data_type: Optionally specify the data type (tabular, image or time-series). Added to metadata.

Additional PyTorch keyword arguments:

device: Device type used. The default None tries to use the GPU and falls back on CPU if needed. Can be specified by passing either 'cuda', 'gpu' or 'cpu'. Only relevant for 'pytorch' backend.

Initialized drift detector example:

The same detector in PyTorch:

We can also easily add preprocessing functions for both frameworks. The following example uses a randomly initialized image encoder in PyTorch:

The same functionality is supported in TensorFlow and the main difference is that you would import from alibi_detect.cd.tensorflow import preprocess_drift. Other preprocessing steps such as the output of hidden layers of a model or extracted text embeddings using transformer models can be used in a similar way in both frameworks. TensorFlow example for the hidden layer output:

Again the same functionality is supported in TensorFlow but with from alibi_detect.cd.tensorflow import preprocess_drift and from alibi_detect.models.tensorflow import TransformerEmbedding imports.

Detect Drift

We detect data drift by sequentially calling predict on single instances x_t (no batch dimension) as they each arrive. We can return the test-statistic and the threshold by setting return_test_stat to True.

The prediction takes the form of a dictionary with meta and data keys. meta contains the detector's metadata while data is also a dictionary which contains the actual predictions stored in the following keys:

is_drift: 1 if the test-window (of the most recent window_size observations) has drifted from the reference data and 0 otherwise.
time: The number of observations that have been so far passed to the detector as test instances.
ert: The expected run-time the detector was configured to run at in the absence of drift.
test_stat: MMD^2 metric between the reference data and the test_window if return_test_stat equals True.
threshold: The value the test-statsitic is required to exceed for drift to be detected if return_test_stat equals True.

Managing State

The detector's state may be saved with the save_state method:

The previously saved state may then be loaded via the load_state method:

Examples

Online Least-Squares Density Difference

source

Online Least-Squares Density Difference

Overview

The online Least Squares Density Difference detector is a non-parametric method for online drift detection. The LSDD between two distributions $p$ and $q$ on $\mathcal{X}$ is defined as $LSDD(p,q) = \int_{\mathcal{X}} (p(x)-q(x))^2 \,dx$ and also has an empirical estimate $\widehat{LSDD}({X_i}{i=1}^N, {Y_i}{i=t}^{t+W})$ that can be updated at low cost as the test window is updated to ${Y_i}_{i=t+1}^{t+1+W}$. The detector is motivated by, but is a modified version of, Bu et al. (2017).

Online detectors assume the reference data is large and fixed and operate on single data points at a time (rather than batches). These data points are passed into the test-window and a two-sample test-statistic (in this case an estimate of LSDD) between the reference data and test-window is computed at each time-step. When the test-statistic exceeds a preconfigured threshold, drift is detected. Configuration of the thresholds requires specification of the expected run-time (ERT) which specifies how many time-steps that the detector, on average, should run for in the absence of drift before making a false detection. It also requires specification of a test-window size, with smaller windows allowing faster response to severe drift and larger windows allowing more power to detect slight drift.

For high-dimensional data, we typically want to reduce the dimensionality before passing it to the detector. Following suggestions in Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift, we incorporate Untrained AutoEncoders (UAE) and black-box shift detection using the classifier's softmax outputs (BBSDs) as out-of-the box preprocessing methods and note that PCA can also be easily implemented using scikit-learn. Preprocessing methods which do not rely on the classifier will usually pick up drift in the input data, while BBSDs focuses on label shift.

Detecting input data drift (covariate shift) $\Delta p(x)$ for text data requires a custom preprocessing step. We can pick up changes in the semantics of the input by extracting (contextual) embeddings and detect drift on those. Strictly speaking we are not detecting $\Delta p(x)$ anymore since the whole training procedure (objective function, training data etc) for the (pre)trained embeddings has an impact on the embeddings we extract. The library contains functionality to leverage pre-trained embeddings from HuggingFace's transformer package but also allows you to easily use your own embeddings of choice. Both options are illustrated with examples in the Text drift detection on IMDB movie reviews notebook.

Usage

Initialize

Arguments:

x_ref: Data used as reference distribution.
ert: The expected run-time in the absence of drift, starting from t=0.
window_size: The size of the sliding test-window used to compute the test-statistic. Smaller windows focus on responding quickly to severe drift, larger windows focus on ability to detect slight drift.

Keyword arguments:

backend: Backend used for the LSDD implementation and configuration.
preprocess_fn: Function to preprocess the data before computing the data drift metrics.
sigma: Optionally set the bandwidth of the Gaussian kernel used in estimating the LSDD. Can also pass multiple bandwidth values as an array. The kernel evaluation is then averaged over those bandwidths. If sigma is not specified, the 'median heuristic' is adopted whereby sigma is set as the median pairwise distance between reference samples.
n_bootstraps: The number of bootstrap simulations used to configure the thresholds. The larger this is the more accurately the desired ERT will be targeted. Should ideally be at least an order of magnitude larger than the ERT.
n_kernel_centers: The number of reference samples to use as centers in the Gaussian kernel model used to estimate LSDD. Defaults to 2*window_size.
lambda_rd_max: The maximum relative difference between two estimates of LSDD that the regularization parameter lambda is allowed to cause. Defaults to 0.2 as in the paper.
verbose: Whether or not to print progress during configuration.
input_shape: Shape of input data.
data_type: Optionally specify the data type (tabular, image or time-series). Added to metadata.

Additional PyTorch keyword arguments:

device: Device type used. The default None tries to use the GPU and falls back on CPU if needed. Can be specified by passing either 'cuda', 'gpu' or 'cpu'. Only relevant for 'pytorch' backend.

Initialized drift detector example:

from alibi_detect.cd import LSDDDriftOnline

cd = LSDDDriftOnline(x_ref, ert, window_size, backend='tensorflow')

The same detector in PyTorch:

cd = LSDDDriftOnline(x_ref, ert, window_size, backend='pytorch')

We can also easily add preprocessing functions for both frameworks. The following example uses a randomly initialized image encoder in PyTorch:

from functools import partial
import torch
import torch.nn as nn
from alibi_detect.cd.pytorch import preprocess_drift

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# define encoder
encoder_net = nn.Sequential(
    nn.Conv2d(3, 64, 4, stride=2, padding=0),
    nn.ReLU(),
    nn.Conv2d(64, 128, 4, stride=2, padding=0),
    nn.ReLU(),
    nn.Conv2d(128, 512, 4, stride=2, padding=0),
    nn.ReLU(),
    nn.Flatten(),
    nn.Linear(2048, 32)
).to(device).eval()

# define preprocessing function
preprocess_fn = partial(preprocess_drift, model=encoder_net, device=device, batch_size=512)

cd = LSDDDriftOnline(x_ref, ert, window_size, backend='pytorch', preprocess_fn=preprocess_fn)

from alibi_detect.cd.tensorflow import HiddenOutput, preprocess_drift

model = # TensorFlow model; tf.keras.Model or tf.keras.Sequential
preprocess_fn = partial(preprocess_drift, model=HiddenOutput(model, layer=-1), batch_size=128)

cd = LSDDDriftOnline(x_ref, ert, window_size, backend='tensorflow', preprocess_fn=preprocess_fn)

Check out the Online Drift Detection on the Wine Quality Dataset example for more details.

Alibi Detect also includes custom text preprocessing steps in both TensorFlow and PyTorch based on Huggingface's transformers package:

import torch
import torch.nn as nn
from transformers import AutoTokenizer
from alibi_detect.cd.pytorch import preprocess_drift
from alibi_detect.models.pytorch import TransformerEmbedding

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

model_name = 'bert-base-cased'
tokenizer = AutoTokenizer.from_pretrained(model_name)

embedding_type = 'hidden_state'
layers = [5, 6, 7]
embed = TransformerEmbedding(model_name, embedding_type, layers)
model = nn.Sequential(embed, nn.Linear(768, 256), nn.ReLU(), nn.Linear(256, enc_dim)).to(device).eval()
preprocess_fn = partial(preprocess_drift, model=model, tokenizer=tokenizer, max_len=512, batch_size=32)

# initialise drift detector
cd = LSDDDriftOnline(x_ref, ert, window_size, backend='pytorch', preprocess_fn=preprocess_fn)

Detect Drift

is_drift: 1 if the test-window (of the most recent window_size observations) has drifted from the reference data and 0 otherwise.
time: The number of observations that have been so far passed to the detector as test instances.
ert: The expected run-time the detector was configured to run at in the absence of drift.
test_stat: LSDD metric between the reference data and the test_window if return_test_stat equals True.
threshold: The value the test-statsitic is required to exceed for drift to be detected if return_test_stat equals True.

preds = cd.predict(x_t, return_test_stat=True)

Managing State

The detector's state may be saved with the save_state method:

cd = LSDDDriftOnline(x_ref, ert, window_size)  # Instantiate detector at t=0
cd.predict(x_1)  # t=1
cd.save_state('checkpoint_t1')  # Save state at t=1
cd.predict(x_2)  # t=2

The previously saved state may then be loaded via the load_state method:

# Load state at t=1
cd.load_state('checkpoint_t1')

At any point, the state may be reset to t=0 with the reset_state method. When saving the detector with save_detector, the state will be saved, unless t=0 (see here).

Examples

Online Drift Detection on the Wine Quality Dataset

Online Cramér-von Mises

source

Online Cramér-von Mises

Overview

The online Cramér-von Mises detector is a non-parametric method for online drift detection on continuous data. Like the offline Cramér-von Mises detector, it applies a univariate Cramér-von Mises (CVM) test to each feature. This detector is an adaptation of that proposed in this paper by Ross et al. .

Warning

This detector is multi-threaded, with Numba used to parallelise over the simulated streams. There is a known issue on MacOS, where Numba's default OpenMP threading layer causes segfaults. A workaround is to use the slightly less performant workqueue threading layer on MacOS by setting the NUMBA_THREADING_LAYER enviroment variable or running:

from numba import config
config.THREADING_LAYER = 'workqueue'

Threshold configuration

Online detectors assume the reference data is large and fixed and operate on single data points at a time (rather than batches). These data points are passed into the test-windows, and a two-sample test-statistic between the reference data and test-window is computed at each time-step. When the test-statistic exceeds a preconfigured threshold, drift is detected. Configuration of the thresholds requires specification of the expected run-time (ERT) which specifies how many time-steps that the detector, on average, should run for in the absence of drift before making a false detection. Thresholds are then configured to target this ERT by simulating n_bootstraps number of streams of length t_max = 2*max(window_sizes) - 1. Conveniently, the non-parametric nature of the detector means that thresholds depend only on $M$, the length of the reference data set. Therefore, for multivariate data, configuration is only as costly as the univariate case.

Note

In order to reduce the memory requirements of the threshold configuration process, streams are simulated in batches of size $N_{batch}$, set with the batch_size keyword argument. However, the memory requirements still scale with $O(M^2N_{batch})$. If configuration is requiring too much memory (or time), then consider subsampling the reference data. The quadratic growth of the cost with respect to the number of reference instances $M$, combined with the diminishing increase in test power, often makes this a worthwhile tradeoff.

Window sizes

Specification of test-window sizes (the detector accepts multiple windows of different size $W$) is also required, with smaller windows allowing faster response to severe drift and larger windows allowing more power to detect slight drift. Since this detector requires the windows to be full to function, the ERT is measured from t = min(window_sizes)-1.

Multivariate data

Although this detector is primarly intended for univariate data, it can also be applied to multivariate data. In this case, the detector makes a correction similar to the Bonferroni correction used for the offline detector. Given $d$ features, the detector configures thresholds by targeting the $1-\beta$ quantile of test statistics over the simulated streams, where $\beta = 1 - (1-(1/ERT))^{(1/d)}$. For the univariate case, this simplifies to $\beta = 1/ERT$. At prediction time, drift is flagged if the test statistic of any feature stream exceed the thresholds.

Note

In the multivariate case, for the ERT's upper bound to be accurate, the feature streams must be independent. Regardless of independence, the ERT will still be properly lower bounded.

Usage

Initialize

Arguments:

x_ref: Data used as reference distribution.
ert: The expected run-time in the absence of drift, starting from t=min(windows_sizes).
window_sizes: The sizes of the sliding test-windows used to compute the test-statistics. Smaller windows focus on responding quickly to severe drift, larger windows focus on ability to detect slight drift.

Keyword arguments:

preprocess_fn: Function to preprocess the data before computing the data drift metrics.
n_bootstraps: The number of bootstrap simulations used to configure the thresholds. The larger this is the more accurately the desired ERT will be targeted. Should ideally be at least an order of magnitude larger than the ERT.
batch_size: The maximum number of bootstrap simulations to compute in each batch when configuring thresholds. A smaller batch size reduces memory requirements, but can result in a longer configuration run time.
n_features: Number of features used in the FET test. No need to pass it if no preprocessing takes place. In case of a preprocessing step, this can also be inferred automatically but could be more expensive to compute.
verbose: Whether or not to print progress during configuration.
input_shape: Shape of input data.
data_type: Optionally specify the data type (tabular, image or time-series). Added to metadata.

Initialized drift detector example:

from alibi_detect.cd import CVMDriftOnline

ert = 150
window_sizes = [20,40]
cd = CVMDriftOnline(x_ref, ert, window_sizes)

Detect Drift

is_drift: 1 if any of the test-windows have drifted from the reference data and 0 otherwise.
time: The number of observations that have been so far passed to the detector as test instances.
ert: The expected run-time the detector was configured to run at in the absence of drift.
test_stat: CVM test-statistics between the reference data and the test_windows if return_test_stat equals True.
threshold: The values the test-statsitics are required to exceed for drift to be detected if return_test_stat equals True.

preds = cd.predict(x_t, return_test_stat=True)

Managing State

The detector's state may be saved with the save_state method:

cd = CVMDriftOnline(x_ref, ert, window_sizes)  # Instantiate detector at t=0
cd.predict(x_1)  # t=1
cd.save_state('checkpoint_t1')  # Save state at t=1
cd.predict(x_2)  # t=2

The previously saved state may then be loaded via the load_state method:

# Load state at t=1
cd.load_state('checkpoint_t1')

At any point, the state may be reset to t=0 with the reset_state method. When saving the detector with save_detector, the state will be saved, unless t=0 (see here).

Online Fisher’s Exact Test

Online Fisher's Exact Test

Overview

The online detector is a non-parametric method for online drift detection. Like the detector, it applies an Fisher's Exact Test (FET) to each feature. It is intended for application to streams, with binary data consisting of either (True, False) or (0, 1). This detector is ideal for use in a supervised setting, monitoring drift in a model's instance level accuracy (i.e. correct prediction = 0, and incorrect prediction = 1).

Threshold configuration

In a similar manner to that proposed in by Ross et al. , thresholds are configured by simulating n_bootstraps Bernoulli streams. The length of streams can be set with the t_max parameter. Since the thresholds are expected to converge after t_max = 2*max(window_sizes) - 1 time steps, we only need to simulate trajectories and estimate thresholds up to this point, and t_max is set to this value by default. Following , the test statistics are smoothed using an exponential moving average to remove their discreteness, allowing more precise quantiles to be targeted:

For a window size of $W$, at time $t$ the value of the statistic $F_t$ depends on more than just the previous $W$ values. If $\lambda$, set by lam, is too small, thresholds may keep decreasing well past $2W - 1$ timesteps. To avoid this, the default lam is set to a high value of $\lambda=0.99$, meaning that discreteness is still broken, but the value of the test statistic depends (almost) solely on the last $W$ observations. If more smoothing is desired, the t_max parameter can be manually set at a larger value.

Note

The detector must configure thresholds for each window size and each feature. This can be a time consuming process if the number of features is high. For high-dimensional data users are recommended to apply a dimension reduction step via preprocess_fn.

Window sizes

Specification of test-window sizes (the detector accepts multiple windows of different size $W$) is also required, with smaller windows allowing faster response to severe drift and larger windows allowing more power to detect slight drift. Since this detector requires a window to be full to function, the ERT is measured from t = min(window_sizes)-1.

Multivariate data

Note

In the multivariate case, for the ERT to be accurately targeted the feature streams must be independent.

Usage

Initialize

Arguments:

x_ref: Data used as reference distribution.
ert: The expected run-time in the absence of drift, starting from t=min(windows_sizes).
window_sizes: The sizes of the sliding test-windows used to compute the test-statistics. Smaller windows focus on responding quickly to severe drift, larger windows focus on ability to detect slight drift.

Keyword arguments:

preprocess_fn: Function to preprocess the data before computing the data drift metrics.
n_bootstraps: The number of bootstrap simulations used to configure the thresholds. The larger this is the more accurately the desired ERT will be targeted. Should ideally be at least an order of magnitude larger than the ERT.
t_max: Length of streams to simulate when configuring thresholds. If None, this is set to 2 * max(window_sizes) - 1.
alternative: Defines the alternative hypothesis. Options are 'greater' (default) or 'less', corresponding to an increase or decrease in the mean of the Bernoulli stream.
lam: Smoothing coefficient used for exponential moving average. If heavy smoothing is applied (lam<<1), a larger t_max may be necessary in order to ensure the thresholds have converged.
n_features: Number of features used in the FET test. No need to pass it if no preprocessing takes place. In case of a preprocessing step, this can also be inferred automatically but could be more expensive to compute.
verbose: Whether or not to print progress during configuration.
input_shape: Shape of input data.
data_type: Optionally specify the data type (tabular, image or time-series). Added to metadata.

Initialized drift detector example:

Detect Drift

is_drift: 1 if any of the test-windows have drifted from the reference data and 0 otherwise.
time: The number of observations that have been so far passed to the detector as test instances.
ert: The expected run-time the detector was configured to run at in the absence of drift.
test_stat: FET test-statistics (1-p_val) between the reference data and the test_windows if return_test_stat equals True.
threshold: The values the test-statsitics are required to exceed for drift to be detected if return_test_stat equals True.

Managing State

The detector's state may be saved with the save_state method:

The previously saved state may then be loaded via the load_state method:

References

Online Cramér-von Mises

source

Online Cramér-von Mises

Overview

Warning

from numba import config
config.THREADING_LAYER = 'workqueue'

Threshold configuration

Note

Window sizes

Multivariate data

Note

In the multivariate case, for the ERT's upper bound to be accurate, the feature streams must be independent. Regardless of independence, the ERT will still be properly lower bounded.

Usage

Initialize

Arguments:

x_ref: Data used as reference distribution.
ert: The expected run-time in the absence of drift, starting from t=min(windows_sizes).
window_sizes: The sizes of the sliding test-windows used to compute the test-statistics. Smaller windows focus on responding quickly to severe drift, larger windows focus on ability to detect slight drift.

Keyword arguments:

preprocess_fn: Function to preprocess the data before computing the data drift metrics.
n_bootstraps: The number of bootstrap simulations used to configure the thresholds. The larger this is the more accurately the desired ERT will be targeted. Should ideally be at least an order of magnitude larger than the ERT.
batch_size: The maximum number of bootstrap simulations to compute in each batch when configuring thresholds. A smaller batch size reduces memory requirements, but can result in a longer configuration run time.
n_features: Number of features used in the FET test. No need to pass it if no preprocessing takes place. In case of a preprocessing step, this can also be inferred automatically but could be more expensive to compute.
verbose: Whether or not to print progress during configuration.
input_shape: Shape of input data.
data_type: Optionally specify the data type (tabular, image or time-series). Added to metadata.

Initialized drift detector example:

from alibi_detect.cd import CVMDriftOnline

ert = 150
window_sizes = [20,40]
cd = CVMDriftOnline(x_ref, ert, window_sizes)

Detect Drift

is_drift: 1 if any of the test-windows have drifted from the reference data and 0 otherwise.
time: The number of observations that have been so far passed to the detector as test instances.
ert: The expected run-time the detector was configured to run at in the absence of drift.
test_stat: CVM test-statistics between the reference data and the test_windows if return_test_stat equals True.
threshold: The values the test-statsitics are required to exceed for drift to be detected if return_test_stat equals True.

preds = cd.predict(x_t, return_test_stat=True)

Managing State

The detector's state may be saved with the save_state method:

cd = CVMDriftOnline(x_ref, ert, window_sizes)  # Instantiate detector at t=0
cd.predict(x_1)  # t=1
cd.save_state('checkpoint_t1')  # Save state at t=1
cd.predict(x_2)  # t=2

The previously saved state may then be loaded via the load_state method:

# Load state at t=1
cd.load_state('checkpoint_t1')

At any point, the state may be reset to t=0 with the reset_state method. When saving the detector with save_detector, the state will be saved, unless t=0 (see here).

Online Fisher’s Exact Test

Online Fisher's Exact Test

Overview

Threshold configuration

Note

Window sizes

Specification of test-window sizes (the detector accepts multiple windows of different size $W$) is also required, with smaller windows allowing faster response to severe drift and larger windows allowing more power to detect slight drift. Since this detector requires a window to be full to function, the ERT is measured from t = min(window_sizes)-1.

Multivariate data

Note

In the multivariate case, for the ERT to be accurately targeted the feature streams must be independent.

Usage

Initialize

Arguments:

x_ref: Data used as reference distribution.
ert: The expected run-time in the absence of drift, starting from t=min(windows_sizes).
window_sizes: The sizes of the sliding test-windows used to compute the test-statistics. Smaller windows focus on responding quickly to severe drift, larger windows focus on ability to detect slight drift.

Keyword arguments:

preprocess_fn: Function to preprocess the data before computing the data drift metrics.
n_bootstraps: The number of bootstrap simulations used to configure the thresholds. The larger this is the more accurately the desired ERT will be targeted. Should ideally be at least an order of magnitude larger than the ERT.
t_max: Length of streams to simulate when configuring thresholds. If None, this is set to 2 * max(window_sizes) - 1.
alternative: Defines the alternative hypothesis. Options are 'greater' (default) or 'less', corresponding to an increase or decrease in the mean of the Bernoulli stream.
lam: Smoothing coefficient used for exponential moving average. If heavy smoothing is applied (lam<<1), a larger t_max may be necessary in order to ensure the thresholds have converged.
n_features: Number of features used in the FET test. No need to pass it if no preprocessing takes place. In case of a preprocessing step, this can also be inferred automatically but could be more expensive to compute.
verbose: Whether or not to print progress during configuration.
input_shape: Shape of input data.
data_type: Optionally specify the data type (tabular, image or time-series). Added to metadata.

Initialized drift detector example:

from alibi_detect.cd import FETDriftOnline

ert = 150
window_sizes = [20,40]
cd = FETDriftOnline(x_ref, ert, window_sizes)

Detect Drift

is_drift: 1 if any of the test-windows have drifted from the reference data and 0 otherwise.
time: The number of observations that have been so far passed to the detector as test instances.
ert: The expected run-time the detector was configured to run at in the absence of drift.
test_stat: FET test-statistics (1-p_val) between the reference data and the test_windows if return_test_stat equals True.
threshold: The values the test-statsitics are required to exceed for drift to be detected if return_test_stat equals True.

preds = cd.predict(x_t, return_test_stat=True)

Managing State

The detector's state may be saved with the save_state method:

cd = FETDriftOnline(x_ref, ert, window_sizes)  # Instantiate detector at t=0
cd.predict(x_1)  # t=1
cd.save_state('checkpoint_t1')  # Save state at t=1
cd.predict(x_2)  # t=2

The previously saved state may then be loaded via the load_state method:

# Load state at t=1
cd.load_state('checkpoint_t1')

At any point, the state may be reset to t=0 with the reset_state method. When saving the detector with save_detector, the state will be saved, unless t=0 (see ).

References

[1] Ross, G.J., Tasoulis, D.K. & Adams, N.M. Sequential monitoring of a Bernoulli sequence when the pre-change parameter is unknown. Comput Stat 28, 463–479 (2013). doi: . arXiv: .

Online Maximum Mean Discrepancy

Overview

Usage

Initialize

Arguments:

x_ref: Data used as reference distribution.
ert: The expected run-time in the absence of drift, starting from t=0.
window_size: The size of the sliding test-window used to compute the test-statistic. Smaller windows focus on responding quickly to severe drift, larger windows focus on ability to detect slight drift.

Keyword arguments:

backend: Backend used for the MMD implementation and configuration.
preprocess_fn: Function to preprocess the data before computing the data drift metrics.
kernel: Kernel used for the MMD computation, defaults to Gaussian RBF kernel.
sigma: Optionally set the GaussianRBF kernel bandwidth. Can also pass multiple bandwidth values as an array. The kernel evaluation is then averaged over those bandwidths. If sigma is not specified, the 'median heuristic' is adopted whereby sigma is set as the median pairwise distance between reference samples.
n_bootstraps: The number of bootstrap simulations used to configure the thresholds. The larger this is the more accurately the desired ERT will be targeted. Should ideally be at least an order of magnitude larger than the ERT.
verbose: Whether or not to print progress during configuration.
input_shape: Shape of input data.
data_type: Optionally specify the data type (tabular, image or time-series). Added to metadata.

Additional PyTorch keyword arguments:

device: Device type used. The default None tries to use the GPU and falls back on CPU if needed. Can be specified by passing either 'cuda', 'gpu' or 'cpu'. Only relevant for 'pytorch' backend.

Initialized drift detector example:

from alibi_detect.cd import MMDDriftOnline

cd = MMDDriftOnline(x_ref, ert, window_size, backend='tensorflow')

The same detector in PyTorch:

cd = MMDDriftOnline(x_ref, ert, window_size, backend='pytorch')

We can also easily add preprocessing functions for both frameworks. The following example uses a randomly initialized image encoder in PyTorch:

from functools import partial
import torch
import torch.nn as nn
from alibi_detect.cd.pytorch import preprocess_drift

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# define encoder
encoder_net = nn.Sequential(
    nn.Conv2d(3, 64, 4, stride=2, padding=0),
    nn.ReLU(),
    nn.Conv2d(64, 128, 4, stride=2, padding=0),
    nn.ReLU(),
    nn.Conv2d(128, 512, 4, stride=2, padding=0),
    nn.ReLU(),
    nn.Flatten(),
    nn.Linear(2048, 32)
).to(device).eval()

# define preprocessing function
preprocess_fn = partial(preprocess_drift, model=encoder_net, device=device, batch_size=512)

cd = MMDDriftOnline(x_ref, ert, window_size, backend='pytorch', preprocess_fn=preprocess_fn)

from alibi_detect.cd.tensorflow import HiddenOutput, preprocess_drift

model = # TensorFlow model; tf.keras.Model or tf.keras.Sequential
preprocess_fn = partial(preprocess_drift, model=HiddenOutput(model, layer=-1), batch_size=128)

cd = MMDDriftOnline(x_ref, ert, window_size, backend='tensorflow', preprocess_fn=preprocess_fn)

Check out the example for more details.

Alibi Detect also includes custom text preprocessing steps in both TensorFlow and PyTorch based on Huggingface's package:

import torch
import torch.nn as nn
from transformers import AutoTokenizer
from alibi_detect.cd.pytorch import preprocess_drift
from alibi_detect.models.pytorch import TransformerEmbedding

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

model_name = 'bert-base-cased'
tokenizer = AutoTokenizer.from_pretrained(model_name)

embedding_type = 'hidden_state'
layers = [5, 6, 7]
embed = TransformerEmbedding(model_name, embedding_type, layers)
model = nn.Sequential(embed, nn.Linear(768, 256), nn.ReLU(), nn.Linear(256, enc_dim)).to(device).eval()
preprocess_fn = partial(preprocess_drift, model=model, tokenizer=tokenizer, max_len=512, batch_size=32)

# initialise drift detector
cd = MMDDriftOnline(x_ref, ert, window_size, backend='pytorch', preprocess_fn=preprocess_fn)

Detect Drift

is_drift: 1 if the test-window (of the most recent window_size observations) has drifted from the reference data and 0 otherwise.
time: The number of observations that have been so far passed to the detector as test instances.
ert: The expected run-time the detector was configured to run at in the absence of drift.
test_stat: MMD^2 metric between the reference data and the test_window if return_test_stat equals True.
threshold: The value the test-statsitic is required to exceed for drift to be detected if return_test_stat equals True.

preds = cd.predict(x_t, return_test_stat=True)

Managing State

The detector's state may be saved with the save_state method:

cd = MMDDriftOnline(x_ref, ert, window_size)  # Instantiate detector at t=0
cd.predict(x_1)  # t=1
cd.save_state('checkpoint_t1')  # Save state at t=1
cd.predict(x_2)  # t=2

The previously saved state may then be loaded via the load_state method:

# Load state at t=1
cd.load_state('checkpoint_t1')

At any point, the state may be reset to t=0 with the reset_state method. When saving the detector with save_detector, the state will be saved, unless t=0 (see ).

Examples

Online Least-Squares Density Difference

source

Online Least-Squares Density Difference

Overview

Online detectors assume the reference data is large and fixed and operate on single data points at a time (rather than batches). These data points are passed into the test-window and a two-sample test-statistic (in this case an estimate of LSDD) between the reference data and test-window is computed at each time-step. When the test-statistic exceeds a preconfigured threshold, drift is detected. Configuration of the thresholds requires specification of the expected run-time (ERT) which specifies how many time-steps that the detector, on average, should run for in the absence of drift before making a false detection. It also requires specification of a test-window size, with smaller windows allowing faster response to severe drift and larger windows allowing more power to detect slight drift.

Detecting input data drift (covariate shift) $\Delta p(x)$ for text data requires a custom preprocessing step. We can pick up changes in the semantics of the input by extracting (contextual) embeddings and detect drift on those. Strictly speaking we are not detecting $\Delta p(x)$ anymore since the whole training procedure (objective function, training data etc) for the (pre)trained embeddings has an impact on the embeddings we extract. The library contains functionality to leverage pre-trained embeddings from HuggingFace's transformer package but also allows you to easily use your own embeddings of choice. Both options are illustrated with examples in the Text drift detection on IMDB movie reviews notebook.

Usage

Initialize

Arguments:

x_ref: Data used as reference distribution.
ert: The expected run-time in the absence of drift, starting from t=0.
window_size: The size of the sliding test-window used to compute the test-statistic. Smaller windows focus on responding quickly to severe drift, larger windows focus on ability to detect slight drift.

Keyword arguments:

backend: Backend used for the LSDD implementation and configuration.
preprocess_fn: Function to preprocess the data before computing the data drift metrics.
sigma: Optionally set the bandwidth of the Gaussian kernel used in estimating the LSDD. Can also pass multiple bandwidth values as an array. The kernel evaluation is then averaged over those bandwidths. If sigma is not specified, the 'median heuristic' is adopted whereby sigma is set as the median pairwise distance between reference samples.
n_bootstraps: The number of bootstrap simulations used to configure the thresholds. The larger this is the more accurately the desired ERT will be targeted. Should ideally be at least an order of magnitude larger than the ERT.
n_kernel_centers: The number of reference samples to use as centers in the Gaussian kernel model used to estimate LSDD. Defaults to 2*window_size.
lambda_rd_max: The maximum relative difference between two estimates of LSDD that the regularization parameter lambda is allowed to cause. Defaults to 0.2 as in the paper.
verbose: Whether or not to print progress during configuration.
input_shape: Shape of input data.
data_type: Optionally specify the data type (tabular, image or time-series). Added to metadata.

Additional PyTorch keyword arguments:

device: Device type used. The default None tries to use the GPU and falls back on CPU if needed. Can be specified by passing either 'cuda', 'gpu' or 'cpu'. Only relevant for 'pytorch' backend.

Initialized drift detector example:

from alibi_detect.cd import LSDDDriftOnline

cd = LSDDDriftOnline(x_ref, ert, window_size, backend='tensorflow')

The same detector in PyTorch:

cd = LSDDDriftOnline(x_ref, ert, window_size, backend='pytorch')

We can also easily add preprocessing functions for both frameworks. The following example uses a randomly initialized image encoder in PyTorch:

from functools import partial
import torch
import torch.nn as nn
from alibi_detect.cd.pytorch import preprocess_drift

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# define encoder
encoder_net = nn.Sequential(
    nn.Conv2d(3, 64, 4, stride=2, padding=0),
    nn.ReLU(),
    nn.Conv2d(64, 128, 4, stride=2, padding=0),
    nn.ReLU(),
    nn.Conv2d(128, 512, 4, stride=2, padding=0),
    nn.ReLU(),
    nn.Flatten(),
    nn.Linear(2048, 32)
).to(device).eval()

# define preprocessing function
preprocess_fn = partial(preprocess_drift, model=encoder_net, device=device, batch_size=512)

cd = LSDDDriftOnline(x_ref, ert, window_size, backend='pytorch', preprocess_fn=preprocess_fn)

from alibi_detect.cd.tensorflow import HiddenOutput, preprocess_drift

model = # TensorFlow model; tf.keras.Model or tf.keras.Sequential
preprocess_fn = partial(preprocess_drift, model=HiddenOutput(model, layer=-1), batch_size=128)

cd = LSDDDriftOnline(x_ref, ert, window_size, backend='tensorflow', preprocess_fn=preprocess_fn)

Check out the Online Drift Detection on the Wine Quality Dataset example for more details.

Alibi Detect also includes custom text preprocessing steps in both TensorFlow and PyTorch based on Huggingface's transformers package:

import torch
import torch.nn as nn
from transformers import AutoTokenizer
from alibi_detect.cd.pytorch import preprocess_drift
from alibi_detect.models.pytorch import TransformerEmbedding

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

model_name = 'bert-base-cased'
tokenizer = AutoTokenizer.from_pretrained(model_name)

embedding_type = 'hidden_state'
layers = [5, 6, 7]
embed = TransformerEmbedding(model_name, embedding_type, layers)
model = nn.Sequential(embed, nn.Linear(768, 256), nn.ReLU(), nn.Linear(256, enc_dim)).to(device).eval()
preprocess_fn = partial(preprocess_drift, model=model, tokenizer=tokenizer, max_len=512, batch_size=32)

# initialise drift detector
cd = LSDDDriftOnline(x_ref, ert, window_size, backend='pytorch', preprocess_fn=preprocess_fn)

Detect Drift

is_drift: 1 if the test-window (of the most recent window_size observations) has drifted from the reference data and 0 otherwise.
time: The number of observations that have been so far passed to the detector as test instances.
ert: The expected run-time the detector was configured to run at in the absence of drift.
test_stat: LSDD metric between the reference data and the test_window if return_test_stat equals True.
threshold: The value the test-statsitic is required to exceed for drift to be detected if return_test_stat equals True.

preds = cd.predict(x_t, return_test_stat=True)

Managing State

The detector's state may be saved with the save_state method:

cd = LSDDDriftOnline(x_ref, ert, window_size)  # Instantiate detector at t=0
cd.predict(x_1)  # t=1
cd.save_state('checkpoint_t1')  # Save state at t=1
cd.predict(x_2)  # t=2

The previously saved state may then be loaded via the load_state method:

# Load state at t=1
cd.load_state('checkpoint_t1')

At any point, the state may be reset to t=0 with the reset_state method. When saving the detector with save_detector, the state will be saved, unless t=0 (see here).

Examples

Online Drift Detection on the Wine Quality Dataset