githubEdit

Deployment

1. Dataset

CIFAR10arrow-up-right consists of 60,000 32 by 32 RGB images equally distributed over 10 classes: airplane, automobile, bird, cat, deer, dog, frog, horse, ship and truck.

#| code_folding: [0]
# imports and plot examples
import matplotlib.pyplot as plt
%matplotlib inline
import tensorflow as tf

(X_train, y_train), (X_test, y_test) = tf.keras.datasets.cifar10.load_data()
X_train = X_train.astype('float32') / 255
X_test = X_test.astype('float32') / 255
y_train = y_train.astype('int64').reshape(-1,)
y_test = y_test.astype('int64').reshape(-1,)
print('Train: ', X_train.shape, y_train.shape)
print('Test: ', X_test.shape, y_test.shape)

plt.figure(figsize=(10, 10))
n = 4
for i in range(n ** 2):
    plt.subplot(n, n, i + 1)
    plt.imshow(X_train[i])
    plt.axis('off')
plt.show();

2. Outlier detection with a variational autoencoder (VAE)

Method

In a nutshell:

  • Train a VAE on normal data so it can reconstruct inliers well

  • If the VAE cannot reconstruct the incoming requests well? Outlier!

More resources on VAE: paperarrow-up-right, excellent blog postarrow-up-right

vae-lillog.png

Image source: https://lilianweng.github.io/lil-log/2018/08/12/from-autoencoder-to-beta-vae.html

Load detector or train from scratch

The pretrained outlier and adversarial detectors used in the notebook can be found herearrow-up-right. You can use the built-in fetch_detector function which saves the pre-trained models in a local directory filepath and loads the detector. Alternatively, you can train a detector from scratch:

Let's check whether the model manages to reconstruct the in-distribution training data:

Setting the threshold

Finding good threshold values can be tricky since they are typically not easy to interpret. The infer_threshold method helps finding a sensible value. We need to pass a batch of instances X and specify what percentage of those we consider to be normal via threshold_perc.

Create and detect outliers

We can create some outliers by applying a random noise mask to the original instances:

Deploy the detector

For this example we use the open source deployment platform Seldon Corearrow-up-right and eventing based project Knativearrow-up-right which allows serverless components to be connected to event streams. The Seldon Core payload logger sends events containing model requests to the Knative broker which can farm these out to serverless components such as the outlier, drift or adversarial detection modules. Further eventing components can be added to feed off events produced by these components to send onwards to, for example, alerting or storage modules. This happens asynchronously.

deploy-diagram.png

We already configured a cluster on DigitalOcean with Seldon Core installed. The configuration steps to set everything up from scratch are detailed in this example notebookarrow-up-right.

First we get the IP address of the Istio Ingress Gateway. This assumes Istio is installed with a LoadBalancer.

We define some utility functions for the prediction of the deployed model.

Let's make a prediction on the original instance:

Let's check the message dumper for the output of the outlier detector:

We then make a prediction on the perturbed instance:

Although the prediction is still correct, the instance is clearly an outlier:

3. Adversarial detection by matching prediction probabilities

Method

The adversarial detector is based on Adversarial Detection and Correction by Matching Prediction Distributionsarrow-up-right. Usually, autoencoders are trained to find a transformation $T$ that reconstructs the input instance $x$ as accurately as possible with loss functions that are suited to capture the similarities between x and $x'$ such as the mean squared reconstruction error. The novelty of the adversarial autoencoder (AE) detector relies on the use of a classification model-dependent loss function based on a distance metric in the output space of the model to train the autoencoder network. Given a classification model $M$ we optimise the weights of the autoencoder such that the KL-divergencearrow-up-right between the model predictions on $x$ and on $x'$ is minimised. Without the presence of a reconstruction loss term $x'$ simply tries to make sure that the prediction probabilities $M(x')$ and $M(x)$ match without caring about the proximity of $x'$ to $x$. As a result, $x'$ is allowed to live in different areas of the input feature space than $x$ with different decision boundary shapes with respect to the model $M$. The carefully crafted adversarial perturbation which is effective around x does not transfer to the new location of $x'$ in the feature space, and the attack is therefore neutralised. Training of the autoencoder is unsupervised since we only need access to the model prediction probabilities and the normal training instances. We do not require any knowledge about the underlying adversarial attack and the classifier weights are frozen during training.

The detector can be used as follows:

  • An adversarial score $S$ is computed. $S$ equals the K-L divergence between the model predictions on $x$ and $x'$.

  • If $S$ is above a threshold (explicitly defined or inferred from training data), the instance is flagged as adversarial.

  • For adversarial instances, the model $M$ uses the reconstructed instance $x'$ to make a prediction. If the adversarial score is below the threshold, the model makes a prediction on the original instance $x$.

This procedure is illustrated in the diagram below:

adversarialae.png

The method is very flexible and can also be used to detect common data corruptions and perturbations which negatively impact the model performance.

Utility functions

Rescale data

The ResNet classification model is trained on data standardized by instance:

Load pre-trained classifier

Check the predictions on the test:

Adversarial attack

We investigate both Carlini-Wagner (C&W)arrow-up-right and SLIDEarrow-up-right attacks. You can simply load previously found adversarial instances on the pretrained ResNet-56 model. The attacks are generated by using Foolboxarrow-up-right:

We can verify that the accuracy of the classifier drops to almost $0$%:

Let's visualise some adversarial instances:

Load or train and evaluate the adversarial detector

We can again either fetch the pretrained detector from a Google Cloud Bucketarrow-up-right or train one from scratch:

The detector first reconstructs the input instances which can be adversarial. The reconstructed input is then fed to the classifier to compute the adversarial score. If the score is above a threshold, the instance is classified as adversarial and the detector tries to correct the attack. Let's investigate what happens when we reconstruct attacked instances and make predictions on them:

Accuracy on attacked vs. reconstructed instances:

The detector restores the accuracy after the attacks from almost $0$% to well over $80$%! We can compute the adversarial scores and inspect some of the reconstructed instances:

The ROC curves and AUC values show the effectiveness of the adversarial score to detect adversarial instances:

The threshold for the adversarial score can be set via infer_threshold. We need to pass a batch of instances $X$ and specify what percentage of those we consider to be normal via threshold_perc. Assume we have only normal instances some of which the model has misclassified leading to a higher score if the reconstruction picked up features from the correct class or some might look adversarial in the first place. As a result, we set our threshold at $95$%:

The correct method of the detector executes the diagram in Figure 1. First the adversarial scores is computed. For instances where the score is above the threshold, the classifier prediction on the reconstructed instance is returned. Otherwise the original prediction is kept. The method returns a dictionary containing the metadata of the detector, whether the instances in the batch are adversarial (above the threshold) or not, the classifier predictions using the correction mechanism and both the original and reconstructed predictions. Let's illustrate this on a batch containing some adversarial (C&W) and original test set instances:

Let's check the model performance:

This can be improved with the correction mechanism:

There are a few other tricks highlighted in the paperarrow-up-right (temperature scaling and hidden layer K-L divergence) and implemented in Alibi Detect which can further boost the adversarial detector's performance. Check this example notebookarrow-up-right for more details.

4. Drift detection with Kolmogorov-Smirnov

Method

The drift detector applies feature-wise two-sample Kolmogorov-Smirnovarrow-up-right (K-S) tests. For multivariate data, the obtained p-values for each feature are aggregated either via the Bonferroniarrow-up-right or the False Discovery Ratearrow-up-right (FDR) correction. The Bonferroni correction is more conservative and controls for the probability of at least one false positive. The FDR correction on the other hand allows for an expected fraction of false positives to occur.

For high-dimensional data, we typically want to reduce the dimensionality before computing the feature-wise univariate K-S tests and aggregating those via the chosen correction method. Following suggestions in Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shiftarrow-up-right, we incorporate Untrained AutoEncoders (UAE), black-box shift detection using the classifier's softmax outputs (BBSDsarrow-up-right) and PCAarrow-up-right as out-of-the box preprocessing methods. Preprocessing methods which do not rely on the classifier will usually pick up drift in the input data, while BBSDs focuses on label shift. The adversarial detectorarrow-up-right which is part of the library can also be transformed into a drift detector picking up drift that reduces the performance of the classification model. We can therefore combine different preprocessing techniques to figure out if there is drift which hurts the model performance, and whether this drift can be classified as input drift or label shift.

Note that the library also has a drift detector based on the Maximum Mean Discrepancyarrow-up-right and contains drift on text functionalityarrow-up-right as well.

Dataset

We will use the CIFAR-10-C dataset (Hendrycks & Dietterich, 2019arrow-up-right) to evaluate the drift detector. The instances in CIFAR-10-C come from the test set in CIFAR-10 but have been corrupted and perturbed by various types of noise, blur, brightness etc. at different levels of severity, leading to a gradual decline in the classification model performance. We also check for drift against the original test set with class imbalances.

We can select from the following corruption types at 5 severity levels:

Let's pick a subset of the corruptions at corruption level 5. Each corruption type consists of perturbations on all of the original test set images.

We split the original test set in a reference dataset and a dataset which should not be rejected under the H0 of the K-S test. We also split the corrupted data by corruption type:

We can visualise the same instance for each corruption type:

We can also verify that the performance of a ResNet-32 classification model on CIFAR-10 drops significantly on this perturbed dataset:

Given the drop in performance, it is important that we detect the harmful data drift!

Detect drift

We are trying to detect data drift on high-dimensional (32x32x3) data using an aggregation of univariate K-S tests. It therefore makes sense to apply dimensionality reduction first. Some dimensionality reduction methods also used in Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shiftarrow-up-right are readily available: UAE (Untrained AutoEncoder), BBSDs (black-box shift detection using the classifier's softmax outputs) and PCA (using scikit-learn).

Untrained AutoEncoder

First we try UAE:

Let's check whether the detector thinks drift occurred within the original test set:

As expected, no drift occurred. We can also inspect the feature-wise K-S statistics, threshold value and p-values for each univariate K-S test by (encoded) feature before the multivariate correction. Most of them are well above the $0.05$ threshold:

Let's now check the predictions on the perturbed data:

BBSDs

For BBSDs, we use the classifier's softmax outputs for black-box shift detection. This method is based on Detecting and Correcting for Label Shift with Black Box Predictorsarrow-up-right.

Here we use the output of the softmax layer to detect the drift, but other hidden layers can be extracted as well by setting 'layer' to the index of the desired hidden layer in the model:

There is again no drift on the original held out test set:

We compare this with the perturbed data:

For more functionality and examples, such as updating the reference data with reservoir sampling or picking another multivariate correction mechanism, check out this example notebookarrow-up-right.

Leveraging the adversarial detector for malicious drift detection

While monitoring covariate and predicted label shift is all very interesting and exciting, at the end of the day we are mainly interested in whether the drift actually hurt the model performance significantly. To this end, we can leverage the adversarial detector and measure univiariate drift on the adversarial scores!

Make drift predictions on the original test set and corrupted data:

We can therefore use the scores of the detector itself to quantify the harmfulness of the drift! We can generalise this to all the corruptions at each severity level in CIFAR-10-C.

On the plot below we show the mean values and standard deviations of the adversarial scores per severity level. The plot shows the mean adversarial scores (lhs) and ResNet-32 accuracies (rhs) for increasing data corruption severity levels. Level 0 corresponds to the original test set. Harmful scores are scores from instances which have been flipped from the correct to an incorrect prediction because of the corruption. Not harmful means that the prediction was unchanged after the corruption. The chart can be reproduced in this notebookarrow-up-right.

adversarialscores.png

Deploy

We can deploy the drift detector in a similar fashion as the outlier detector. For a more detailed step-by-step overview of the deployment process, check this notebookarrow-up-right.

The deployed drift detector accumulates requests until a predefined drift_batch_size is reached, in our case $5000$ which is defined in the yaml for the deploymentarrow-up-right and set in the drift detector wrapperarrow-up-right. After $5000$ instances, the batch is cleared and fills up again.

We now run the same test on some corrupted data:

Last updated

Was this helpful?