Scaling up drift detection with KeOps
Introduction
A number of convenient and powerful kernel-based drift detectors such as the MMD detector (Gretton et al., 2012) or the learned kernel MMD detector (Liu et al., 2020) do not scale favourably with increasing dataset size $n$, leading to quadratic complexity $\mathcal{O}(n^2)$ for naive implementations. As a result, we can quickly run into memory issues by having to store the $[N_\text{ref} + N_\text{test}, N_\text{ref} + N_\text{test}]$ kernel matrix (on the GPU if applicable) used for an efficient implementation of the permutation test. Note that $N_\text{ref}$ is the reference data size and $N_\text{test}$ the test data size.
We can however drastically speed up and scale up kernel-based drift detectors to large dataset sizes by working with symbolic kernel matrices instead and leverage the KeOps library to do so. For the user of $\texttt{Alibi Detect}$ the only thing that changes is the specification of the detector's backend, e.g. for the MMD detector:
from alibi_detect.cd import MMDDrift
detector_torch = MMDDrift(x_ref, backend='pytorch')
detector_keops = MMDDrift(x_ref, backend='keops')In this notebook we will run a few simple benchmarks to illustrate the speed and memory improvements from using KeOps over vanilla PyTorch on the GPU (1x RTX 2080 Ti) for both the standard MMD and learned kernel MMD detectors.
Data
We randomly sample points from the standard normal distribution and run the detectors with PyTorch and KeOps backends for the following settings:
$N_\text{ref}, N_\text{test} = [2, 5, 10, 20, 50, 100]$ (batch sizes in '000s)
$D = [2, 10, 50]$
Where $D$ denotes the number of features.
Requirements
The notebook requires PyTorch and KeOps to be installed. Once PyTorch is installed, KeOps can be installed via pip:
!pip install pykeopsBefore we start let’s fix the random seeds for reproducibility:
Vanilla PyTorch vs. KeOps comparison
Utility functions
First we define some utility functions to run the experiments:
As detailed earlier, we will compare the PyTorch with the KeOps implementation of the MMD and learned kernel MMD detectors for a variety of reference and test data batch sizes as well as different feature dimensions. Note that for the PyTorch implementation, the portion of the kernel matrix for the reference data itself can already be computed at initialisation of the detector. This computation will not be included when we record the detector's prediction time. Since use cases where $N_\text{ref} >> N_\text{test}$ are quite common, we will also test for this specific setting. The key reason is that we cannot amortise this computation for the KeOps detector since we are working with lazily evaluated symbolic matrices.
MMD detector
1. $N_\text{ref} = N_\text{test}$
Note that for KeOps we could further increase the number of instances in the reference and test sets (e.g. to 500,000) without running into memory issues.
Below we visualise the runtimes of the different experiments. We can make the following observations:
The relative speed improvements of KeOps over vanilla PyTorch increase with increasing batch size.
Due to the explicit kernel computation and storage, the PyTorch detector runs out-of-memory after a little over 10,000 instances in each of the reference and test sets while KeOps keeps scaling up without any issues.
The relative speed improvements decline with growing feature dimension. Note however that we would not recommend using a (untrained) MMD detector on very high-dimensional data in the first place.
The plots show both the absolute and relative (PyTorch / KeOps) mean prediction times for the MMD drift detector for different feature dimensions $[2, 10, 50]$.
The difference between KeOps and PyTorch is even more striking when we only look at $[2, 10]$ features:
2. $N_\text{ref} >> N_\text{test}$
Now we check whether the speed improvements still hold when $N_\text{ref} >> N_\text{test}$ ($N_\text{ref} / N_\text{test} = 10$) and a large part of the kernel can already be computed at initialisation time of the PyTorch (but not the KeOps) detector.
The below plots illustrate that KeOps indeed still provides large speed ups over PyTorch. The x-axis shows the reference batch size $N_\text{ref}$. Note that $N_\text{ref} / N_\text{test} = 10$.
Learned kernel MMD detector
We conduct similar experiments as for the MMD detector for $N_\text{ref} = N_\text{test}$ and n_features=50. We use a deep learned kernel with an MLP followed by Gaussian RBF kernels and project the input features on a d_out=2-dimensional space. Since the learned kernel detector computes the kernel matrix in a batch-wise manner, we can also scale up the number of instances for the PyTorch backend without running out-of-memory.
We again plot the absolute and relative (PyTorch / KeOps) mean prediction times for the learned kernel MMD drift detector for different feature dimensions:
Conclusion
As illustrated in the experiments, KeOps allows you to drastically speed up and scale up drift detection to larger datasets without running into memory issues. The speed benefit of KeOps over the PyTorch (or TensorFlow) MMD detectors decrease as the number of features increases. Note though that it is not advised to apply the (untrained) MMD detector to very high-dimensional data in the first place and that we can apply dimensionality reduction via the deep kernel for the learned kernel MMD detector.
Last updated
Was this helpful?

