1 of 26

Outlier Detection

Methods

Mahalanobis Distance

Overview

The Mahalanobis online outlier detector aims to predict anomalies in tabular data. The algorithm calculates an outlier score, which is a measure of distance from the center of the features distribution (Mahalanobis distance). If this outlier score is higher than a user-defined threshold, the observation is flagged as an outlier. The algorithm is online, which means that it starts without knowledge about the distribution of the features and learns as requests arrive. Consequently you should expect the output to be bad at the start and to improve over time. The algorithm is suitable for low to medium dimensional tabular data.

The algorithm is also able to include categorical variables. The fit step first computes pairwise distances between the categories of each categorical variable. The pairwise distances are based on either the model predictions (MVDM method) or the context provided by the other variables in the dataset (ABDM method). For MVDM, we use the difference between the conditional model prediction probabilities of each category. This method is based on the Modified Value Difference Metric (MVDM) by Cost et al (1993). ABDM stands for Association-Based Distance Metric, a categorical distance measure introduced by Le et al (2005). ABDM infers context from the presence of other variables in the data and computes a dissimilarity measure based on the Kullback-Leibler divergence. Both methods can also be combined as ABDM-MVDM. We can then apply multidimensional scaling to project the pairwise distances into Euclidean space.

Usage

Initialize

Parameters:

threshold: Mahalanobis distance threshold above which the instance is flagged as an outlier.
n_components: number of principal components used.
std_clip: feature-wise standard deviation used to clip the observations before updating the mean and covariance matrix.
start_clip: number of observations before clipping is applied.
max_n: algorithm behaves as if it has seen at most max_n points.
cat_vars: dictionary with as keys the categorical columns and as values the number of categories per categorical variable. Only needed if categorical variables are present.
ohe: boolean whether the categorical variables are one-hot encoded (OHE) or not. If not OHE, they are assumed to have ordinal encodings.
data_type: can specify data type added to metadata. E.g. 'tabular' or 'image'.

Initialized outlier detector example:

from alibi_detect.od import Mahalanobis

od = Mahalanobis(
    threshold=10.,
    n_components=2,
    std_clip=3,
    start_clip=100
)

Fit

We only need to fit the outlier detector if there are categorical variables present in the data. The following parameters can be specified:

X: training batch as a numpy array.
y: model class predictions or ground truth labels for X. Used for 'mvdm' and 'abdm-mvdm' pairwise distance metrics. Not needed for 'abdm'.
d_type: pairwise distance metric used for categorical variables. Currently, 'abdm', 'mvdm' and 'abdm-mvdm' are supported. 'abdm' infers context from the other variables while 'mvdm' uses the model predictions. 'abdm-mvdm' is a weighted combination of the two metrics.
w: weight on 'abdm' (between 0. and 1.) distance if d_type equals 'abdm-mvdm'.
disc_perc: list with percentiles used in binning of numerical features used for the 'abdm' and 'abdm-mvdm' pairwise distance measures.
standardize_cat_vars: standardize numerical values of categorical variables if True.
feature_range: tuple with min and max ranges to allow for numerical values of categorical variables. Min and max ranges can be floats or numpy arrays with dimension (1, number of features) for feature-wise ranges.
smooth: smoothing exponent between 0 and 1 for the distances. Lower values will smooth the difference in distance metric between different features.
center: whether to center the scaled distance measures. If False, the min distance for each feature except for the feature with the highest raw max distance will be the lower bound of the feature range, but the upper bound will be below the max feature range.

od.fit(
    X_train,
    d_type='abdm',
    disc_perc=[25, 50, 75]
)

It is often hard to find a good threshold value. If we have a batch of normal and outlier data and we know approximately the percentage of normal data in the batch, we can infer a suitable threshold:

od.infer_threshold(
    X, 
    threshold_perc=95
)

Beware though that the outlier detector is stateful and every call to the score function will update the mean and covariance matrix, even when inferring the threshold.

Detect

We detect outliers by simply calling predict on a batch of instances X to compute the instance level Mahalanobis distances. We can also return the instance level outlier score by setting return_instance_score to True.

The prediction takes the form of a dictionary with meta and data keys. meta contains the detector's metadata while data is also a dictionary which contains the actual predictions stored in the following keys:

is_outlier: boolean whether instances are above the threshold and therefore outlier instances. The array is of shape (batch size,).
instance_score: contains instance level scores if return_instance_score equals True.

preds = od.predict(
    X,
    return_instance_score=True
)

Examples

Tabular

Outlier detection on KDD Cup 99

Isolation Forest

source

Isolation Forest

Overview

Isolation forests (IF) are tree based models specifically used for outlier detection. The IF isolates observations by randomly selecting a feature and then randomly selecting a split value between the maximum and minimum values of the selected feature. The number of splittings required to isolate a sample is equivalent to the path length from the root node to the terminating node. This path length, averaged over a forest of random trees, is a measure of normality and is used to define an anomaly score. Outliers can typically be isolated quicker, leading to shorter paths. The algorithm is suitable for low to medium dimensional tabular data.

Usage

Initialize

Parameters:

threshold: threshold value for the outlier score above which the instance is flagged as an outlier.
n_estimators: number of base estimators in the ensemble. Defaults to 100.
max_samples: number of samples to draw from the training data to train each base estimator. If int, draw max_samples samples. If float, draw max_samples times number of features samples. If 'auto', max_samples = min(256, number of samples).
max_features: number of features to draw from the training data to train each base estimator. If int, draw max_features features. If float, draw max_features times number of features features.
bootstrap: whether to fit individual trees on random subsets of the training data, sampled with replacement.
n_jobs: number of jobs to run in parallel for fit and predict.
data_type: can specify data type added to metadata. E.g. 'tabular' or 'image'.

Initialized outlier detector example:

from alibi_detect.od import IForest

od = IForest(
    threshold=0.,
    n_estimators=100
)

Fit

We then need to train the outlier detector. The following parameters can be specified:

X: training batch as a numpy array.
sample_weight: array with shape (batch size,) used to assign different weights to each instance during training. Defaults to None.

od.fit(
    X_train
)

It is often hard to find a good threshold value. If we have a batch of normal and outlier data and we know approximately the percentage of normal data in the batch, we can infer a suitable threshold:

od.infer_threshold(
    X, 
    threshold_perc=95
)

Detect

We detect outliers by simply calling predict on a batch of instances X to compute the instance level outlier scores. We can also return the instance level outlier score by setting return_instance_score to True.

is_outlier: boolean whether instances are above the threshold and therefore outlier instances. The array is of shape (batch size,).
instance_score: contains instance level scores if return_instance_score equals True.

preds = od.predict(
    X,
    return_instance_score=True
)

Examples

Tabular

Outlier detection on KDD Cup 99

Variational Auto-Encoder

source

Variational Auto-Encoder

Overview

The Variational Auto-Encoder (VAE) outlier detector is first trained on a batch of unlabeled, but normal (inlier) data. Unsupervised or semi-supervised training is desirable since labeled data is often scarce. The VAE detector tries to reconstruct the input it receives. If the input data cannot be reconstructed well, the reconstruction error is high and the data can be flagged as an outlier. The reconstruction error is either measured as the mean squared error (MSE) between the input and the reconstructed instance or as the probability that both the input and the reconstructed instance are generated by the same process. The algorithm is suitable for tabular and image data.

Usage

Initialize

Parameters:

threshold: threshold value above which the instance is flagged as an outlier.
score_type: scoring method used to detect outliers. Currently only the default 'mse' supported.
latent_dim: latent dimension of the VAE.
encoder_net: tf.keras.Sequential instance containing the encoder network. Example:

encoder_net = tf.keras.Sequential(
  [
      InputLayer(input_shape=(32, 32, 3)),
      Conv2D(64, 4, strides=2, padding='same', activation=tf.nn.relu),
      Conv2D(128, 4, strides=2, padding='same', activation=tf.nn.relu),
      Conv2D(512, 4, strides=2, padding='same', activation=tf.nn.relu)
  ])

decoder_net: tf.keras.Sequential instance containing the decoder network. Example:

decoder_net = tf.keras.Sequential(
  [
      InputLayer(input_shape=(latent_dim,)),
      Dense(4*4*128),
      Reshape(target_shape=(4, 4, 128)),
      Conv2DTranspose(256, 4, strides=2, padding='same', activation=tf.nn.relu),
      Conv2DTranspose(64, 4, strides=2, padding='same', activation=tf.nn.relu),
      Conv2DTranspose(3, 4, strides=2, padding='same', activation='sigmoid')
  ])

vae: instead of using a separate encoder and decoder, the VAE can also be passed as a tf.keras.Model.
samples: number of samples drawn during detection for each instance to detect.
beta: weight on the KL-divergence loss term following the $\beta$-VAE framework. Default equals 1.
data_type: can specify data type added to metadata. E.g. 'tabular' or 'image'.

Initialized outlier detector example:

from alibi_detect.od import OutlierVAE

od = OutlierVAE(
    threshold=0.1,
    encoder_net=encoder_net,
    decoder_net=decoder_net,
    latent_dim=1024,
    samples=10
)

Fit

We then need to train the outlier detector. The following parameters can be specified:

X: training batch as a numpy array of preferably normal data.
loss_fn: loss function used for training. Defaults to the elbo loss.
optimizer: optimizer used for training. Defaults to Adam with learning rate 1e-3.
cov_elbo: dictionary with covariance matrix options in case the elbo loss function is used. Either use the full covariance matrix inferred from X (dict(cov_full=None)), only the variance (dict(cov_diag=None)) or a float representing the same standard deviation for each feature (e.g. dict(sim=.05)) which is the default.
epochs: number of training epochs.
batch_size: batch size used during training.
verbose: boolean whether to print training progress.
log_metric: additional metrics whose progress will be displayed if verbose equals True.

od.fit(
    X_train,
    epochs=50
)

It is often hard to find a good threshold value. If we have a batch of normal and outlier data and we know approximately the percentage of normal data in the batch, we can infer a suitable threshold:

od.infer_threshold(
    X, 
    threshold_perc=95
)

Detect

We detect outliers by simply calling predict on a batch of instances X. Detection can be customized via the following parameters:

outlier_type: either 'instance' or 'feature'. If the outlier type equals 'instance', the outlier score at the instance level will be used to classify the instance as an outlier or not. If 'feature' is selected, outlier detection happens at the feature level (e.g. by pixel in images).
outlier_perc: percentage of the sorted (descending) feature level outlier scores. We might for instance want to flag an image as an outlier if at least 20% of the pixel values are on average above the threshold. In this case, we set outlier_perc to 20. The default value is 100 (using all the features).
return_feature_score: boolean whether to return the feature level outlier scores.
return_instance_score: boolean whether to return the instance level outlier scores.

is_outlier: boolean whether instances or features are above the threshold and therefore outliers. If outlier_type equals 'instance', then the array is of shape (batch size,). If it equals 'feature', then the array is of shape (batch size, instance shape).
feature_score: contains feature level scores if return_feature_score equals True.
instance_score: contains instance level scores if return_instance_score equals True.

preds = od.predict(
    X,
    outlier_type='instance',
    outlier_perc=75,
    return_feature_score=True,
    return_instance_score=True
)

Examples

Image

Outlier detection on CIFAR10

Tabular

Outlier detection on KDD Cup 99

Outlier detection on Adult dataset

Auto-Encoder

Overview

The Auto-Encoder (AE) outlier detector is first trained on a batch of unlabeled, but normal (inlier) data. Unsupervised training is desireable since labeled data is often scarce. The AE detector tries to reconstruct the input it receives. If the input data cannot be reconstructed well, the reconstruction error is high and the data can be flagged as an outlier. The reconstruction error is measured as the mean squared error (MSE) between the input and the reconstructed instance.

Usage

Initialize

Parameters:

threshold: threshold value above which the instance is flagged as an outlier.
encoder_net: tf.keras.Sequential instance containing the encoder network. Example:

encoder_net = tf.keras.Sequential(
  [
      InputLayer(input_shape=(32, 32, 3)),
      Conv2D(64, 4, strides=2, padding='same', activation=tf.nn.relu),
      Conv2D(128, 4, strides=2, padding='same', activation=tf.nn.relu),
      Conv2D(512, 4, strides=2, padding='same', activation=tf.nn.relu)
  ])

decoder_net: tf.keras.Sequential instance containing the decoder network. Example:

decoder_net = tf.keras.Sequential(
  [
      InputLayer(input_shape=(1024,)),
      Dense(4*4*128),
      Reshape(target_shape=(4, 4, 128)),
      Conv2DTranspose(256, 4, strides=2, padding='same', activation=tf.nn.relu),
      Conv2DTranspose(64, 4, strides=2, padding='same', activation=tf.nn.relu),
      Conv2DTranspose(3, 4, strides=2, padding='same', activation='sigmoid')
  ])

ae: instead of using a separate encoder and decoder, the AE can also be passed as a tf.keras.Model.
data_type: can specify data type added to metadata. E.g. 'tabular' or 'image'.

Initialized outlier detector example:

from alibi_detect.od import OutlierAE

od = OutlierAE(threshold=0.1,
               encoder_net=encoder_net,
               decoder_net=decoder_net)

Fit

We then need to train the outlier detector. The following parameters can be specified:

X: training batch as a numpy array of preferably normal data.
loss_fn: loss function used for training. Defaults to the Mean Squared Error loss.
optimizer: optimizer used for training. Defaults to Adam with learning rate 1e-3.
epochs: number of training epochs.
batch_size: batch size used during training.
verbose: boolean whether to print training progress.
log_metric: additional metrics whose progress will be displayed if verbose equals True.

od.fit(X_train, epochs=50)

It is often hard to find a good threshold value. If we have a batch of normal and outlier data and we know approximately the percentage of normal data in the batch, we can infer a suitable threshold:

od.infer_threshold(X, threshold_perc=95)

Detect

We detect outliers by simply calling predict on a batch of instances X. Detection can be customized via the following parameters:

outlier_type: either 'instance' or 'feature'. If the outlier type equals 'instance', the outlier score at the instance level will be used to classify the instance as an outlier or not. If 'feature' is selected, outlier detection happens at the feature level (e.g. by pixel in images).
outlier_perc: percentage of the sorted (descending) feature level outlier scores. We might for instance want to flag an image as an outlier if at least 20% of the pixel values are on average above the threshold. In this case, we set outlier_perc to 20. The default value is 100 (using all the features).
return_feature_score: boolean whether to return the feature level outlier scores.
return_instance_score: boolean whether to return the instance level outlier scores.

is_outlier: boolean whether instances or features are above the threshold and therefore outliers. If outlier_type equals 'instance', then the array is of shape (batch size,). If it equals 'feature', then the array is of shape (batch size, instance shape).
feature_score: contains feature level scores if return_feature_score equals True.
instance_score: contains instance level scores if return_instance_score equals True.

preds = od.predict(X,
                   outlier_type='instance',
                   outlier_perc=75,
                   return_feature_score=True,
                   return_instance_score=True)

Examples

Image

Outlier detection on CIFAR10

Variational Auto-Encoding Gaussian Mixture Model

source

Variational Auto-Encoding Gaussian Mixture Model

Overview

The Variational Auto-Encoding Gaussian Mixture Model (VAEGMM) Outlier Detector follows the Deep Autoencoding Gaussian Mixture Model for Unsupervised Anomaly Detection paper but with a VAE instead of a regular Auto-Encoder. The encoder compresses the data while the reconstructed instances generated by the decoder are used to create additional features based on the reconstruction error between the input and the reconstructions. These features are combined with encodings and fed into a Gaussian Mixture Model (GMM). The VAEGMM outlier detector is first trained on a batch of unlabeled, but normal (inlier) data. Unsupervised or semi-supervised training is desirable since labeled data is often scarce. The sample energy of the GMM can then be used to determine whether an instance is an outlier (high sample energy) or not (low sample energy). The algorithm is suitable for tabular and image data.

Usage

Initialize

Parameters:

threshold: threshold value for the sample energy above which the instance is flagged as an outlier.
latent_dim: latent dimension of the VAE.
n_gmm: number of components in the GMM.
encoder_net: tf.keras.Sequential instance containing the encoder network. Example:

encoder_net = tf.keras.Sequential(
[
    InputLayer(input_shape=(n_features,)),
    Dense(60, activation=tf.nn.tanh),
    Dense(30, activation=tf.nn.tanh),
    Dense(10, activation=tf.nn.tanh),
    Dense(latent_dim, activation=None)
])

decoder_net: tf.keras.Sequential instance containing the decoder network. Example:

decoder_net = tf.keras.Sequential(
[
    InputLayer(input_shape=(latent_dim,)),
    Dense(10, activation=tf.nn.tanh),
    Dense(30, activation=tf.nn.tanh),
    Dense(60, activation=tf.nn.tanh),
    Dense(n_features, activation=None)
])

gmm_density_net: layers for the GMM network wrapped in a tf.keras.Sequential class. Example:

gmm_density_net = tf.keras.Sequential(
[
    InputLayer(input_shape=(latent_dim + 2,)),
    Dense(10, activation=tf.nn.tanh),
    Dense(n_gmm, activation=tf.nn.softmax)
])

vaegmm: instead of using a separate encoder, decoder and GMM density net, the VAEGMM can also be passed as a tf.keras.Model.
samples: number of samples drawn during detection for each instance to detect.
beta: weight on the KL-divergence loss term following the $\beta$-VAE framework. Default equals 1.
recon_features: function to extract features from the reconstructed instance by the decoder. Defaults to a combination of the mean squared reconstruction error and the cosine similarity between the original and reconstructed instances by the VAE.
data_type: can specify data type added to metadata. E.g. 'tabular' or 'image'.

Initialized outlier detector example:

from alibi_detect.od import OutlierVAEGMM

od = OutlierVAEGMM(
    threshold=7.5,
    encoder_net=encoder_net,
    decoder_net=decoder_net,
    gmm_density_net=gmm_density_net,
    latent_dim=4,
    n_gmm=2,
    samples=10
)

Fit

We then need to train the outlier detector. The following parameters can be specified:

X: training batch as a numpy array of preferably normal data.
loss_fn: loss function used for training. Defaults to the custom VAEGMM loss which is a combination of the elbo loss, sample energy of the GMM and a loss term penalizing small values on the diagonals of the covariance matrices in the GMM to avoid trivial solutions. It is important to balance the loss weights below so no single loss term dominates during the optimization.
w_recon: weight on elbo loss term. Defaults to 1e-7.
w_energy: weight on sample energy loss term. Defaults to 0.1.
w_cov_diag: weight on covariance diagonals. Defaults to 0.005.
optimizer: optimizer used for training. Defaults to Adam with learning rate 1e-4.
cov_elbo: dictionary with covariance matrix options in case the elbo loss function is used. Either use the full covariance matrix inferred from X (dict(cov_full=None)), only the variance (dict(cov_diag=None)) or a float representing the same standard deviation for each feature (e.g. dict(sim=.05)) which is the default.
epochs: number of training epochs.
batch_size: batch size used during training.
verbose: boolean whether to print training progress.
log_metric: additional metrics whose progress will be displayed if verbose equals True.

od.fit(
    X_train,
    epochs=10,
    batch_size=1024
)

It is often hard to find a good threshold value. If we have a batch of normal and outlier data and we know approximately the percentage of normal data in the batch, we can infer a suitable threshold:

od.infer_threshold(
    X, 
    threshold_perc=95
)

Detect

We detect outliers by simply calling predict on a batch of instances X to compute the instance level sample energies. We can also return the instance level outlier score by setting return_instance_score to True.

is_outlier: boolean whether instances are above the threshold and therefore outlier instances. The array is of shape (batch size,).
instance_score: contains instance level scores if return_instance_score equals True.

preds = od.predict(
    X,
    return_instance_score=True
)

Examples

Tabular

Outlier detection on KDD Cup 99

Auto-Encoding Gaussian Mixture Model

source

Auto-Encoding Gaussian Mixture Model

Overview

The Auto-Encoding Gaussian Mixture Model (AEGMM) Outlier Detector follows the Deep Autoencoding Gaussian Mixture Model for Unsupervised Anomaly Detection paper. The encoder compresses the data while the reconstructed instances generated by the decoder are used to create additional features based on the reconstruction error between the input and the reconstructions. These features are combined with encodings and fed into a Gaussian Mixture Model (GMM). The AEGMM outlier detector is first trained on a batch of unlabeled, but normal (inlier) data. Unsupervised or semi-supervised training is desirable since labeled data is often scarce. The sample energy of the GMM can then be used to determine whether an instance is an outlier (high sample energy) or not (low sample energy). The algorithm is suitable for tabular and image data.

Usage

Initialize

Parameters:

threshold: threshold value for the sample energy above which the instance is flagged as an outlier.
n_gmm: number of components in the GMM.
encoder_net: tf.keras.Sequential instance containing the encoder network. Example:

encoder_net = tf.keras.Sequential(
[
    InputLayer(input_shape=(n_features,)),
    Dense(60, activation=tf.nn.tanh),
    Dense(30, activation=tf.nn.tanh),
    Dense(10, activation=tf.nn.tanh),
    Dense(latent_dim, activation=None)
])

decoder_net: tf.keras.Sequential instance containing the decoder network. Example:

decoder_net = tf.keras.Sequential(
[
    InputLayer(input_shape=(latent_dim,)),
    Dense(10, activation=tf.nn.tanh),
    Dense(30, activation=tf.nn.tanh),
    Dense(60, activation=tf.nn.tanh),
    Dense(n_features, activation=None)
])

gmm_density_net: layers for the GMM network wrapped in a tf.keras.Sequential class. Example:

gmm_density_net = tf.keras.Sequential(
[
    InputLayer(input_shape=(latent_dim + 2,)),
    Dense(10, activation=tf.nn.tanh),
    Dense(n_gmm, activation=tf.nn.softmax)
])

aegmm: instead of using a separate encoder, decoder and GMM density net, the AEGMM can also be passed as a tf.keras.Model.
recon_features: function to extract features from the reconstructed instance by the decoder. Defaults to a combination of the mean squared reconstruction error and the cosine similarity between the original and reconstructed instances by the AE.
data_type: can specify data type added to metadata. E.g. 'tabular' or 'image'.

Initialized outlier detector example:

from alibi_detect.od import OutlierAEGMM

od = OutlierAEGMM(
    threshold=7.5,
    encoder_net=encoder_net,
    decoder_net=decoder_net,
    gmm_density_net=gmm_density_net,
    n_gmm=2
)

Fit

We then need to train the outlier detector. The following parameters can be specified:

X: training batch as a numpy array of preferably normal data.
loss_fn: loss function used for training. Defaults to the custom AEGMM loss which is a combination of the mean squared reconstruction error, the sample energy of the GMM and a loss term penalizing small values on the diagonals of the covariance matrices in the GMM to avoid trivial solutions. It is important to balance the loss weights below so no single loss term dominates during the optimization.
w_energy: weight on sample energy loss term. Defaults to 0.1.
w_cov_diag: weight on covariance diagonals. Defaults to 0.005.
optimizer: optimizer used for training. Defaults to Adam with learning rate 1e-4.
epochs: number of training epochs.
batch_size: batch size used during training.
verbose: boolean whether to print training progress.
log_metric: additional metrics whose progress will be displayed if verbose equals True.

od.fit(
    X_train,
    epochs=10,
    batch_size=1024
)

It is often hard to find a good threshold value. If we have a batch of normal and outlier data and we know approximately the percentage of normal data in the batch, we can infer a suitable threshold:

od.infer_threshold(
    X, 
    threshold_perc=95
)

Detect

is_outlier: boolean whether instances are above the threshold and therefore outlier instances. The array is of shape (batch size,).
instance_score: contains instance level scores if return_instance_score equals True.

preds = od.predict(
    X,
    return_instance_score=True
)

Examples

Tabular

Outlier detection on KDD Cup 99

Likelihood Ratios for Outlier Detection

Overview

The outlier detector described by in uses the likelihood ratio (LLR) between 2 generative models as the outlier score. One model is trained on the original data while the other is trained on a perturbed version of the dataset. This is based on the observation that the log likelihood for an instance under a generative model can be heavily affected by population level background statistics. The second generative model is therefore trained to capture the background statistics still present in the perturbed data while the semantic features have been erased by the perturbations.

The perturbations are added using an independent and identical Bernoulli distribution with rate $\mu$ which substitutes a feature with one of the other possible feature values with equal probability. For images, this means for instance changing a pixel with a different pixel value randomly sampled within the $0$ to $255$ pixel range. The package also contains a implementation adapted from the official TensorFlow Probability , and available as a standalone model in alibi_detect.models.tensorflow.pixelcnn.

Usage

Initialize

Parameters:

threshold: outlier threshold value used for the negative likelihood ratio. Scores above the threshold are flagged as outliers.
model: a generative model, either as a tf.keras.Model, TensorFlow Probability distribution or built-in PixelCNN++ model.
model_background: optional separate model fit on the perturbed background data. If this is not specified, a copy of model will be used.
log_prob: if the model does not have a log_prob function like e.g. a TensorFlow Probability distribution, a function needs to be passed that evaluates the log likelihood.
sequential: flag whether the data is sequential or not. Used to create targets during training. Defaults to False.
data_type: can specify data type added to metadata. E.g. 'tabular' or 'image'.

Initialized outlier detector example:

Fit

We then need to train the 2 generative models in sequence. The following parameters can be specified:

X: training batch as a numpy array of preferably normal data.
mutate_fn: function used to create the perturbations. Defaults to an independent and identical Bernoulli distribution with rate $\mu$
mutate_fn_kwargs: kwargs for mutate_fn. For the default function, the mutation rate and feature range needs to be specified, e.g. dict(rate=.2, feature_range=(0,255)).
loss_fn: loss function used for the generative models.
loss_fn_kwargs: kwargs for the loss function.
epochs: number of training epochs.
batch_size: batch size used during training.
log_metric: additional metrics whose progress will be displayed if verbose equals True.

It is often hard to find a good threshold value. If we have a batch of normal and outlier data and we know approximately the percentage of normal data in the batch, we can infer a suitable threshold:

Detect

We detect outliers by simply calling predict on a batch of instances X. Detection can be customized via the following parameters:

outlier_type: either 'instance' or 'feature'. If the outlier type equals 'instance', the outlier score at the instance level will be used to classify the instance as an outlier or not. If 'feature' is selected, outlier detection happens at the feature level (e.g. by pixel in images).
batch_size: batch size used for model prediction calls.
return_feature_score: boolean whether to return the feature level outlier scores.
return_instance_score: boolean whether to return the instance level outlier scores.

is_outlier: boolean whether instances or features are above the threshold and therefore outliers. If outlier_type equals 'instance', then the array is of shape (batch size,). If it equals 'feature', then the array is of shape (batch size, instance shape).
feature_score: contains feature level scores if return_feature_score equals True.
instance_score: contains instance level scores if return_instance_score equals True.

Examples

Image

Sequential Data

Prophet Detector

Overview

The Prophet outlier detector uses the time series forecasting package explained in . The underlying Prophet model is a decomposable univariate time series model combining trend, seasonality and holiday effects. The model forecast also includes an uncertainty interval around the estimated trend component using the of the extrapolated model. Alternatively, full Bayesian inference can be done at the expense of increased compute. The upper and lower values of the uncertainty interval can then be used as outlier thresholds for each point in time. First, the distance from the observed value to the nearest uncertainty boundary (upper or lower) is computed. If the observation is within the boundaries, the outlier score equals the negative distance. As a result, the outlier score is the lowest when the observation equals the model prediction. If the observation is outside of the boundaries, the score equals the distance measure and the observation is flagged as an outlier. One of the main drawbacks of the method however is that you need to refit the model as new data comes in. This is undesirable for applications with high throughput and real-time detection.

Note

To use this detector, first install Prophet by running:

This will install Prophet, and its major dependency PyStan. PyStan is currently only . If this detector is to be used on a Windows system, it is recommended to manually install (and test) PyStan before running the command above.

Usage

Initialize

Parameters:

threshold: width of the uncertainty intervals of the forecast, used as outlier threshold. Equivalent to interval_width. If the instance lies outside of the uncertainty intervals, it is flagged as an outlier. If mcmc_samples equals 0, it is the uncertainty in the trend using the MAP estimate of the extrapolated model. If mcmc_samples >0, then uncertainty over all parameters is used.
growth: 'linear' or 'logistic' to specify a linear or logistic trend.
cap: growth cap in case growth equals 'logistic'.
holidays: pandas DataFrame with columns 'holiday' (string) and 'ds' (dates) and optionally columns 'lower_window' and 'upper_window' which specify a range of days around the date to be included as holidays.
holidays_prior_scale: parameter controlling the strength of the holiday components model. Higher values imply a more flexible trend, more prone to more overfitting.
country_holidays: include country-specific holidays via country abbreviations. The holidays for each country are provided by the holidays package in Python. A list of available countries and the country name to use is available on: https://github.com/dr-prodigy/python-holidays. Additionally, Prophet includes holidays for: Brazil (BR), Indonesia (ID), India (IN), Malaysia (MY), Vietnam (VN), Thailand (TH), Philippines (PH), Turkey (TU), Pakistan (PK), Bangladesh (BD), Egypt (EG), China (CN) and Russian (RU).
changepoint_prior_scale: parameter controlling the flexibility of the automatic changepoint selection. Large values will allow many changepoints, potentially leading to overfitting.
changepoint_range: proportion of history in which trend changepoints will be estimated. Higher values means more changepoints, potentially leading to overfitting.
seasonality_mode: either 'additive' or 'multiplicative'.
daily_seasonality: can be 'auto', True, False, or a number of Fourier terms to generate.
weekly_seasonality: can be 'auto', True, False, or a number of Fourier terms to generate.
yearly_seasonality: can be 'auto', True, False, or a number of Fourier terms to generate.
add_seasonality: manually add one or more seasonality components. Pass a list of dicts containing the keys 'name', 'period', 'fourier_order' (obligatory), 'prior_scale' and 'mode' (optional).
seasonality_prior_scale: parameter controlling the strength of the seasonality model. Larger values allow the model to fit larger seasonal fluctuations, potentially leading to overfitting.
uncertainty_samples: number of simulated draws used to estimate uncertainty intervals.
mcmc_samples: If > 0, will do full Bayesian inference with the specified number of MCMC samples. If 0, will do MAP estimation.

Initialized outlier detector example:

Fit

We then need to train the outlier detector. The fit method takes a pandas DataFrame df with as columns 'ds' containing the dates or timestamps and 'y' for the time series being investigated. The date format is ideally YYYY-MM-DD and timestamp format YYYY-MM-DD HH:MM:SS.

Detect

We detect outliers by simply calling predict on a DataFrame df, again with columns 'ds' and 'y' to compute the instance level outlier scores. We can also return the instance level outlier score or the raw Prophet model forecast by setting respectively return_instance_score or return_forecast to True. It is important that the dates or timestamps of the test data follow the training data.

is_outlier: DataFrame with columns 'ds' containing the dates or timestamps and 'is_outlier' a boolean whether instances are above the threshold and therefore outlier instances.
instance_score: DataFrame with 'ds' and 'instance_score' which contains instance level scores if return_instance_score equals True.
forecast: DataFrame with the raw model predictions if return_forecast equals True. The DataFrame contains columns with the upper and lower boundaries ('yhat_upper' and 'yhat_lower'), the model predictions ('yhat'), and the decomposition of the prediction in the different components (trend, seasonality, holiday).

Examples

Spectral Residual

Overview

The Spectral Residual outlier detector is based on the paper and is suitable for unsupervised online anomaly detection in univariate time series data. The algorithm first computes the of the original data. Then it computes the spectral residual of the log amplitude of the transformed signal before applying the Inverse Fourier Transform to map the sequence back from the frequency to the time domain. This sequence is called the saliency map. The anomaly score is then computed as the relative difference between the saliency map values and their moving averages. If the score is above a threshold, the value at a specific timestep is flagged as an outlier. For more details, please check out the .

Usage

Initialize

Parameters:

threshold: Threshold used to classify outliers. Relative saliency map distance from the moving average.
window_amp: Window used for the moving average in the spectral residual computation. The spectral residual is the difference between the log amplitude of the Fourier Transform and a convolution of the log amplitude over window_amp.
window_local: Window used for the moving average in the outlier score computation. The outlier score computes the relative difference between the saliency map and a moving average of the saliency map over window_local timesteps.
padding_amp_method: Padding method to be used prior to each convolution over log amplitude. Possible values: constant | replicate | reflect. Default value: replicate.
- constant - padding with constant 0.
- replicate - repeats the last/extreme value.
- reflect - reflects the time series.
padding_local_method: Padding method to be used prior to each convolution over saliency map. Possible values: constant | replicate | reflect. Default value: replicate.
- constant - padding with constant 0.
- replicate - repeats the last/extreme value.
- reflect - reflects the time series.
padding_amp_side: Whether to pad the amplitudes on both sides or only on one side. Possible values: bilateral | left | right.
n_est_points: Number of estimated points padded to the end of the sequence.
n_grad_points: Number of points used for the gradient estimation of the additional points padded to the end of the sequence. The paper sets this value to 5.

Initialized outlier detector example:

It is often hard to find a good threshold value. If we have a time series containing both normal and outlier data and we know approximately the percentage of normal data in the time series, we can infer a suitable threshold:

Detect

We detect outliers by simply calling predict on a time series X to compute the outlier scores and flag the anomalies. We can also return the instance (timestep) level outlier score by setting return_instance_score to True.

is_outlier: boolean whether instances are above the threshold and therefore outlier instances. The array is of shape (timesteps,).
instance_score: contains instance level scores if return_instance_score equals True.

Examples

Sequence-to-Sequence (Seq2Seq)

Overview

The (Seq2Seq) outlier detector consists of 2 main building blocks: an encoder and a decoder. The encoder consists of a which processes the input sequence and initializes the decoder. The LSTM decoder then makes sequential predictions for the output sequence. In our case, the decoder aims to reconstruct the input sequence. If the input data cannot be reconstructed well, the reconstruction error is high and the data can be flagged as an outlier. The reconstruction error is measured as the mean squared error (MSE) between the input and the reconstructed instance.

Since even for normal data the reconstruction error can be state-dependent, we add an outlier threshold estimator network to the Seq2Seq model. This network takes in the hidden state of the decoder at each timestep and predicts the estimated reconstruction error for normal data. As a result, the outlier threshold is not static and becomes a function of the model state. This is similar to , but while they train the threshold estimator separately from the Seq2Seq model with a Support-Vector Regressor, we train a neural net regression network end-to-end with the Seq2Seq model.

The detector is first trained on a batch of unlabeled, but normal (inlier) data. Unsupervised training is desireable since labeled data is often scarce. The Seq2Seq outlier detector is suitable for both univariate and multivariate time series.

Usage

Initialize

Parameters:

n_features: number of features in the time series.
seq_len: sequence length fed into the Seq2Seq model.
threshold: threshold used for outlier detection. Can be a float or feature-wise array.
seq2seq: optionally pass an already defined or pretrained Seq2Seq model to the outlier detector as a tf.keras.Model.
threshold_net: optionally pass the layers for the threshold estimation network wrapped in a tf.keras.Sequential instance. Example:

latent_dim: latent dimension of the encoder and decoder.
output_activation: activation used in the Dense output layer of the decoder.
beta: weight on the threshold estimation mean-squared error (MSE) loss term.

Initialized outlier detector example:

Fit

We then need to train the outlier detector. The following parameters can be specified:

X: univariate or multivariate time series array with preferably normal data used for training. Shape equals (batch, n_features) or (batch, seq_len, n_features).
loss_fn: loss function used for training. Defaults to the MSE loss.
epochs: number of training epochs.
batch_size: batch size used during training.
verbose: boolean whether to print training progress.
log_metric: additional metrics whose progress will be displayed if verbose equals True.

It is often hard to find a good threshold value. If we have a batch of normal and outlier data and we know approximately the percentage of normal data in the batch, we can infer a suitable threshold. We can either set the threshold over both features combined or determine a feature-wise threshold. Here we opt for the feature-wise threshold. This is for instance useful when different features have different variance or sensitivity to outliers. The snippet assumes there are about 5% outliers in the first feature and 10% in the second:

Detect

We detect outliers by simply calling predict on a batch of instances X. Detection can be customized via the following parameters:

outlier_type: either 'instance' or 'feature'. If the outlier type equals 'instance', the outlier score at the instance level will be used to classify the instance as an outlier or not. If 'feature' is selected, outlier detection happens at the feature level. It is important to distinguish 2 use cases:
- X has shape (batch, n_features):
  - There are batch instances with n_features features per instance.
- X has shape (batch, seq_len, n_features)
  - Now there are batch instances with seq_len x n_features features per instance.
outlier_perc: percentage of the sorted (descending) feature level outlier scores. We might for instance want to flag a multivariate time series as an outlier at a specific timestamp if at least 75% of the feature values are on average above the threshold. In this case, we set outlier_perc to 75. The default value is 100 (using all the features).
return_feature_score: boolean whether to return the feature level outlier scores.
return_instance_score: boolean whether to return the instance level outlier scores.

is_outlier: boolean whether instances or features are above the threshold and therefore outliers. If outlier_type equals 'instance', then the array is of shape (batch,). If it equals 'feature', then the array is of shape (batch, seq_len, n_features) or (batch, n_features), depending on the shape of X.
feature_score: contains feature level scores if return_feature_score equals True.
instance_score: contains instance level scores if return_instance_score equals True.

Examples

AE outlier detection on CIFAR10

Method

Dataset

CIFAR10 consists of 60,000 32 by 32 RGB images equally distributed over 10 classes.

import logging
import matplotlib.pyplot as plt
import numpy as np
import os
import tensorflow as tf
tf.keras.backend.clear_session()
from tensorflow.keras.layers import Conv2D, Conv2DTranspose, \
    Dense, Layer, Reshape, InputLayer, Flatten
from tqdm import tqdm

from alibi_detect.od import OutlierAE
from alibi_detect.utils.fetching import fetch_detector
from alibi_detect.utils.perturbation import apply_mask
from alibi_detect.saving import save_detector, load_detector
from alibi_detect.utils.visualize import plot_instance_score, plot_feature_outlier_image

logger = tf.get_logger()
logger.setLevel(logging.ERROR)

Load CIFAR10 data

train, test = tf.keras.datasets.cifar10.load_data()
X_train, y_train = train
X_test, y_test = test

X_train = X_train.astype('float32') / 255
X_test = X_test.astype('float32') / 255
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

Load or define outlier detector

The pretrained outlier and adversarial detectors used in the example notebooks can be found here. You can use the built-in fetch_detector function which saves the pre-trained models in a local directory filepath and loads the detector. Alternatively, you can train a detector from scratch:

load_outlier_detector = True

#| scrolled: true
filepath = 'my_path'  # change to (absolute) directory where model is downloaded
detector_type = 'outlier'
dataset = 'cifar10'
detector_name = 'OutlierAE'
filepath = os.path.join(filepath, detector_name)
if load_outlier_detector:  # load pretrained outlier detector
    od = fetch_detector(filepath, detector_type, dataset, detector_name)
else:  # define model, initialize, train and save outlier detector
    encoding_dim = 1024
    
    encoder_net = tf.keras.Sequential(
      [
          InputLayer(input_shape=(32, 32, 3)),
          Conv2D(64, 4, strides=2, padding='same', activation=tf.nn.relu),
          Conv2D(128, 4, strides=2, padding='same', activation=tf.nn.relu),
          Conv2D(512, 4, strides=2, padding='same', activation=tf.nn.relu),
          Flatten(),
          Dense(encoding_dim,)
      ])

    decoder_net = tf.keras.Sequential(
      [
          InputLayer(input_shape=(encoding_dim,)),
          Dense(4*4*128),
          Reshape(target_shape=(4, 4, 128)),
          Conv2DTranspose(256, 4, strides=2, padding='same', activation=tf.nn.relu),
          Conv2DTranspose(64, 4, strides=2, padding='same', activation=tf.nn.relu),
          Conv2DTranspose(3, 4, strides=2, padding='same', activation='sigmoid')
      ])
    
    # initialize outlier detector
    od = OutlierAE(threshold=.015,  # threshold for outlier score
                    encoder_net=encoder_net,  # can also pass AE model instead
                    decoder_net=decoder_net,  # of separate encoder and decoder
                    )
    # train
    od.fit(X_train,
           epochs=50,
           verbose=True)
    
    # save the trained outlier detector
    save_detector(od, filepath)

Check quality AE model

idx = 8
X = X_train[idx].reshape(1, 32, 32, 3)
X_recon = od.ae(X)

plt.imshow(X.reshape(32, 32, 3))
plt.axis('off')
plt.show()

plt.imshow(X_recon.numpy().reshape(32, 32, 3))
plt.axis('off')
plt.show()

Check outliers on original CIFAR images

X = X_train[:500]
print(X.shape)

od_preds = od.predict(X,
                      outlier_type='instance',    # use 'feature' or 'instance' level
                      return_feature_score=True,  # scores used to determine outliers
                      return_instance_score=True)
print(list(od_preds['data'].keys()))

Plot instance level outlier scores

target = np.zeros(X.shape[0],).astype(int)  # all normal CIFAR10 training instances
labels = ['normal', 'outlier']
plot_instance_score(od_preds, target, labels, od.threshold)

Visualize predictions

X_recon = od.ae(X).numpy()
plot_feature_outlier_image(od_preds, 
                           X, 
                           X_recon=X_recon,
                           instance_ids=[8, 60, 100, 330],  # pass a list with indices of instances to display
                           max_instances=5,  # max nb of instances to display
                           outliers_only=False)  # only show outlier predictions

Predict outliers on perturbed CIFAR images

We perturb CIFAR images by adding random noise to patches (masks) of the image. For each mask size in n_mask_sizes, sample n_masks and apply those to each of the n_imgs images. Then we predict outliers on the masked instances:

# nb of predictions per image: n_masks * n_mask_sizes 
n_mask_sizes = 10
n_masks = 20
n_imgs = 50

Define masks and get images:

mask_sizes = [(2*n,2*n) for n in range(1,n_mask_sizes+1)]
print(mask_sizes)
img_ids = np.arange(n_imgs)
X_orig = X[img_ids].reshape(img_ids.shape[0], 32, 32, 3)
print(X_orig.shape)

Calculate instance level outlier scores:

#| scrolled: true
all_img_scores = []
for i in tqdm(range(X_orig.shape[0])):
    img_scores = np.zeros((len(mask_sizes),))
    for j, mask_size in enumerate(mask_sizes):
        # create masked instances
        X_mask, mask = apply_mask(X_orig[i].reshape(1, 32, 32, 3),
                                  mask_size=mask_size,
                                  n_masks=n_masks,
                                  channels=[0,1,2],
                                  mask_type='normal',
                                  noise_distr=(0,1),
                                  clip_rng=(0,1))
        # predict outliers
        od_preds_mask = od.predict(X_mask)
        score = od_preds_mask['data']['instance_score']
        # store average score over `n_masks` for a given mask size
        img_scores[j] = np.mean(score)
    all_img_scores.append(img_scores)

Visualize outlier scores vs. mask sizes

x_plt = [mask[0] for mask in mask_sizes]

for ais in all_img_scores:
    plt.plot(x_plt, ais)
    plt.xticks(x_plt)
plt.title('Outlier Score All Images for Increasing Mask Size')
plt.xlabel('Mask size')
plt.ylabel('Outlier Score')
plt.show()

ais_np = np.zeros((len(all_img_scores), all_img_scores[0].shape[0]))
for i, ais in enumerate(all_img_scores):
    ais_np[i, :] = ais
ais_mean = np.mean(ais_np, axis=0)
plt.title('Mean Outlier Score All Images for Increasing Mask Size')
plt.xlabel('Mask size')
plt.ylabel('Outlier score')
plt.plot(x_plt, ais_mean)
plt.xticks(x_plt)
plt.show()

Investigate instance level outlier

i = 8  # index of instance to look at

plt.plot(x_plt, all_img_scores[i])
plt.xticks(x_plt)
plt.title('Outlier Scores Image {} for Increasing Mask Size'.format(i))
plt.xlabel('Mask size')
plt.ylabel('Outlier score')
plt.show()

Reconstruction of masked images and outlier scores per channel:

#| scrolled: true
all_X_mask = []
X_i = X_orig[i].reshape(1, 32, 32, 3)
all_X_mask.append(X_i)
# apply masks
for j, mask_size in enumerate(mask_sizes):
    # create masked instances
    X_mask, mask = apply_mask(X_i,
                              mask_size=mask_size,
                              n_masks=1,  # just 1 for visualization purposes
                              channels=[0,1,2],
                              mask_type='normal',
                              noise_distr=(0,1),
                              clip_rng=(0,1))
    all_X_mask.append(X_mask)
all_X_mask = np.concatenate(all_X_mask, axis=0)
all_X_recon = od.ae(all_X_mask).numpy()
od_preds = od.predict(all_X_mask)

Visualize:

plot_feature_outlier_image(od_preds, 
                           all_X_mask, 
                           X_recon=all_X_recon, 
                           max_instances=all_X_mask.shape[0], 
                           n_channels=3)

Predict outliers on a subset of features

The sensitivity of the outlier detector can not only be controlled via the threshold, but also by selecting the percentage of the features used for the instance level outlier score computation. For instance, we might want to flag outliers if 40% of the features (pixels for images) have an average outlier score above the threshold. This is possible via the outlier_perc argument in the predict function. It specifies the percentage of the features that are used for outlier detection, sorted in descending outlier score order.

perc_list = [20, 40, 60, 80, 100]

all_perc_scores = []
for perc in perc_list:
    od_preds_perc = od.predict(all_X_mask, outlier_perc=perc)
    iscore = od_preds_perc['data']['instance_score']
    all_perc_scores.append(iscore)

Visualize outlier scores vs. mask sizes and percentage of features used:

x_plt = [0] + x_plt
for aps in all_perc_scores:
    plt.plot(x_plt, aps)
    plt.xticks(x_plt)
plt.legend(perc_list)
plt.title('Outlier Score for Increasing Mask Size and Different Feature Subsets')
plt.xlabel('Mask Size')
plt.ylabel('Outlier Score')
plt.show()

Infer outlier threshold value

Finding good threshold values can be tricky since they are typically not easy to interpret. The infer_threshold method helps finding a sensible value. We need to pass a batch of instances X and specify what percentage of those we consider to be normal via threshold_perc.

print('Current threshold: {}'.format(od.threshold))
od.infer_threshold(X, threshold_perc=99)  # assume 1% of the training data are outliers
print('New threshold: {}'.format(od.threshold))

AEGMM and VAEGMM outlier detection on KDD Cup ‘99 dataset

Method

The AEGMM method follows the Deep Autoencoding Gaussian Mixture Model for Unsupervised Anomaly Detection ICLR 2018 paper. The encoder compresses the data while the reconstructed instances generated by the decoder are used to create additional features based on the reconstruction error between the input and the reconstructions. These features are combined with encodings and fed into a Gaussian Mixture Model (GMM). Training of the AEGMM model is unsupervised on normal (inlier) data. The sample energy of the GMM can then be used to determine whether an instance is an outlier (high sample energy) or not (low sample energy). VAEGMM on the other hand uses a variational autoencoder instead of a plain autoencoder.

Dataset

The outlier detector needs to detect computer network intrusions using TCP dump data for a local-area network (LAN) simulating a typical U.S. Air Force LAN. A connection is a sequence of TCP packets starting and ending at some well defined times, between which data flows to and from a source IP address to a target IP address under some well defined protocol. Each connection is labeled as either normal, or as an attack.

There are 4 types of attacks in the dataset:

DOS: denial-of-service, e.g. syn flood;
R2L: unauthorized access from a remote machine, e.g. guessing password;
U2R: unauthorized access to local superuser (root) privileges;
probing: surveillance and other probing, e.g., port scanning.

The dataset contains about 5 million connection records.

There are 3 types of features:

basic features of individual connections, e.g. duration of connection
content features within a connection, e.g. number of failed log in attempts
traffic features within a 2 second window, e.g. number of connections to the same host as the current connection

This notebook requires the seaborn package for visualization which can be installed via pip:

!pip install seaborn

import os
import logging
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
import pandas as pd
import seaborn as sns
from sklearn.metrics import confusion_matrix, f1_score
import tensorflow as tf
tf.keras.backend.clear_session()
from tensorflow.keras.layers import Dense, InputLayer

from alibi_detect.datasets import fetch_kdd
from alibi_detect.models.tensorflow import eucl_cosim_features
from alibi_detect.od import OutlierAEGMM, OutlierVAEGMM
from alibi_detect.utils.data import create_outlier_batch
from alibi_detect.utils.fetching import fetch_detector
from alibi_detect.saving import save_detector, load_detector
from alibi_detect.utils.visualize import plot_instance_score, plot_feature_outlier_tabular, plot_roc

logger = tf.get_logger()
logger.setLevel(logging.ERROR)

Load dataset

We only keep a number of continuous (18 out of 41) features.

kddcup = fetch_kdd(percent10=True)  # only load 10% of the dataset
print(kddcup.data.shape, kddcup.target.shape)

Assume that a model is trained on normal instances of the dataset (not outliers) and standardization is applied:

np.random.seed(0)
normal_batch = create_outlier_batch(kddcup.data, kddcup.target, n_samples=400000, perc_outlier=0)
X_train, y_train = normal_batch.data.astype('float32'), normal_batch.target
print(X_train.shape, y_train.shape)
print('{}% outliers'.format(100 * y_train.mean()))

mean, stdev = X_train.mean(axis=0), X_train.std(axis=0)

Apply standardization:

X_train = (X_train - mean) / stdev

Load or define AEGMM outlier detector

load_outlier_detector = True

filepath = 'my_path'  # change to directory (absolute path) where model is downloaded
detector_type = 'outlier'
dataset = 'kddcup'
detector_name = 'OutlierAEGMM'
filepath = os.path.join(filepath, detector_name)
if load_outlier_detector:  # load pretrained outlier detector
    od = fetch_detector(filepath, detector_type, dataset, detector_name)
else:  # define model, initialize, train and save outlier detector
    # the model defined here is similar to the one defined in the original paper
    n_features = X_train.shape[1]
    latent_dim = 1
    n_gmm = 2  # nb of components in GMM

    encoder_net = tf.keras.Sequential(
    [
        InputLayer(input_shape=(n_features,)),
        Dense(60, activation=tf.nn.tanh),
        Dense(30, activation=tf.nn.tanh),
        Dense(10, activation=tf.nn.tanh),
        Dense(latent_dim, activation=None)
    ])

    decoder_net = tf.keras.Sequential(
    [
        InputLayer(input_shape=(latent_dim,)),
        Dense(10, activation=tf.nn.tanh),
        Dense(30, activation=tf.nn.tanh),
        Dense(60, activation=tf.nn.tanh),
        Dense(n_features, activation=None)
    ])

    gmm_density_net = tf.keras.Sequential(
    [
        InputLayer(input_shape=(latent_dim + 2,)),
        Dense(10, activation=tf.nn.tanh),
        Dense(n_gmm, activation=tf.nn.softmax)
    ])
    
    # initialize outlier detector
    od = OutlierAEGMM(threshold=None,  # threshold for outlier score
                      encoder_net=encoder_net,         # can also pass AEGMM model instead
                      decoder_net=decoder_net,         # of separate encoder, decoder
                      gmm_density_net=gmm_density_net, # and gmm density net
                      n_gmm=n_gmm,
                      recon_features=eucl_cosim_features)  # fn used to derive features
                                                           # from the reconstructed
                                                           # instances based on cosine 
                                                           # similarity and Eucl distance 
    
    # train
    od.fit(X_train,
           epochs=50,
           batch_size=1024,
           verbose=True)
    
    # save the trained outlier detector
    save_detector(od, filepath)

The warning tells us we still need to set the outlier threshold. This can be done with the infer_threshold method. We need to pass a batch of instances and specify what percentage of those we consider to be normal via threshold_perc. Let's assume we have some data which we know contains around 5% outliers. The percentage of outliers can be set with perc_outlier in the create_outlier_batch function.

np.random.seed(0)
perc_outlier = 5
threshold_batch = create_outlier_batch(kddcup.data, kddcup.target, n_samples=1000, perc_outlier=perc_outlier)
X_threshold, y_threshold = threshold_batch.data.astype('float32'), threshold_batch.target
X_threshold = (X_threshold - mean) / stdev
print('{}% outliers'.format(100 * y_threshold.mean()))

od.infer_threshold(X_threshold, threshold_perc=100-perc_outlier)
print('New threshold: {}'.format(od.threshold))

Save outlier detector with updated threshold:

save_detector(od, filepath)

Detect outliers

We now generate a batch of data with 10% outliers and detect the outliers in the batch.

np.random.seed(1)
outlier_batch = create_outlier_batch(kddcup.data, kddcup.target, n_samples=1000, perc_outlier=10)
X_outlier, y_outlier = outlier_batch.data.astype('float32'), outlier_batch.target
X_outlier = (X_outlier - mean) / stdev
print(X_outlier.shape, y_outlier.shape)
print('{}% outliers'.format(100 * y_outlier.mean()))

Predict outliers:

od_preds = od.predict(X_outlier, return_instance_score=True)

Display results

F1 score and confusion matrix:

labels = outlier_batch.target_names
y_pred = od_preds['data']['is_outlier']
f1 = f1_score(y_outlier, y_pred)
print('F1 score: {:.4f}'.format(f1))
cm = confusion_matrix(y_outlier, y_pred)
df_cm = pd.DataFrame(cm, index=labels, columns=labels)
sns.heatmap(df_cm, annot=True, cbar=True, linewidths=.5)
plt.show()

Plot instance level outlier scores vs. the outlier threshold:

plot_instance_score(od_preds, y_outlier, labels, od.threshold, ylim=(None, None))

We can also plot the ROC curve for the outlier scores of the detector:

roc_data = {'AEGMM': {'scores': od_preds['data']['instance_score'], 'labels': y_outlier}}
plot_roc(roc_data)

Investigate results

We can visualize the encodings of the instances in the latent space and the features derived from the instance reconstructions by the decoder. The encodings and features are then fed into the GMM density network.

enc = od.aegmm.encoder(X_outlier)  # encoding
X_recon = od.aegmm.decoder(enc)  # reconstructed instances
recon_features = od.aegmm.recon_features(X_outlier, X_recon)  # reconstructed features

df = pd.DataFrame(dict(enc=enc[:, 0].numpy(), 
                       cos=recon_features[:, 0].numpy(), 
                       eucl=recon_features[:, 1].numpy(), 
                       label=y_outlier))

groups = df.groupby('label')
fig, ax = plt.subplots()
for name, group in groups:
    ax.plot(group.enc, group.cos, marker='o', 
            linestyle='', ms=6, label=labels[name])
plt.title('Encoding vs. Cosine Similarity')
plt.xlabel('Encoding')
plt.ylabel('Cosine Similarity')
ax.legend()
plt.show()

fig, ax = plt.subplots()
for name, group in groups:
    ax.plot(group.enc, group.eucl, marker='o', 
            linestyle='', ms=6, label=labels[name])
plt.title('Encoding vs. Relative Euclidean Distance')
plt.xlabel('Encoding')
plt.ylabel('Relative Euclidean Distance')
ax.legend()
plt.show()

A lot of the outliers are already separated well in the latent space.

Use VAEGMM outlier detector

We can again instantiate the pretrained VAEGMM detector from the Google Cloud Bucket. You can use the built-in fetch_detector function which saves the pre-trained models in a local directory filepath and loads the detector. Alternatively, you can train a detector from scratch:

load_outlier_detector = True

filepath = 'my_path'  # change to directory (absolute path) where model is downloaded
detector_type = 'outlier'
dataset = 'kddcup'
detector_name = 'OutlierVAEGMM'
filepath = os.path.join(filepath, detector_name)
if load_outlier_detector:  # load pretrained outlier detector
    od = fetch_detector(filepath, detector_type, dataset, detector_name)
else:  # define model, initialize, train and save outlier detector
    # the model defined here is similar to the one defined in
    # the OutlierVAE notebook
    n_features = X_train.shape[1]
    latent_dim = 2
    n_gmm = 2

    encoder_net = tf.keras.Sequential(
    [
        InputLayer(input_shape=(n_features,)),
        Dense(20, activation=tf.nn.relu),
        Dense(15, activation=tf.nn.relu),
        Dense(7, activation=tf.nn.relu)
    ])

    decoder_net = tf.keras.Sequential(
    [
        InputLayer(input_shape=(latent_dim,)),
        Dense(7, activation=tf.nn.relu),
        Dense(15, activation=tf.nn.relu),
        Dense(20, activation=tf.nn.relu),
        Dense(n_features, activation=None)
    ])

    gmm_density_net = tf.keras.Sequential(
    [
        InputLayer(input_shape=(latent_dim + 2,)),
        Dense(10, activation=tf.nn.relu),
        Dense(n_gmm, activation=tf.nn.softmax)
    ])
    
    
    # initialize outlier detector
    od = OutlierVAEGMM(threshold=None,
                       encoder_net=encoder_net,
                       decoder_net=decoder_net,
                       gmm_density_net=gmm_density_net,
                       n_gmm=n_gmm,
                       latent_dim=latent_dim,
                       samples=10,
                       recon_features=eucl_cosim_features)
    
    # train
    od.fit(X_train,
           epochs=50,
           batch_size=1024,
           cov_elbo=dict(sim=.0025),  # standard deviation assumption
           verbose=True)           # for elbo training
    
    # save the trained outlier detector
    save_detector(od, filepath)

Need to infer the threshold again:

od.infer_threshold(X_threshold, threshold_perc=100-perc_outlier)
print('New threshold: {}'.format(od.threshold))

Save outlier detector with updated threshold:

save_detector(od, filepath)

Detect outliers and display results

Predict:

od_preds = od.predict(X_outlier, return_instance_score=True)

F1 score and confusion matrix:

labels = outlier_batch.target_names
y_pred = od_preds['data']['is_outlier']
f1 = f1_score(y_outlier, y_pred)
print('F1 score: {:.4f}'.format(f1))
cm = confusion_matrix(y_outlier, y_pred)
df_cm = pd.DataFrame(cm, index=labels, columns=labels)
sns.heatmap(df_cm, annot=True, cbar=True, linewidths=.5)
plt.show()

Plot instance level outlier scores vs. the outlier threshold:

plot_instance_score(od_preds, y_outlier, labels, od.threshold, ylim=(None, None))

You can zoom in by adjusting the min and max values in ylim. We can also compare the VAEGMM ROC curve with AEGMM:

roc_data['VAEGMM'] = {'scores': od_preds['data']['instance_score'], 'labels': y_outlier}
plot_roc(roc_data)

Isolation Forest outlier detection on KDD Cup ‘99 dataset

Method

Dataset

There are 4 types of attacks in the dataset:

DOS: denial-of-service, e.g. syn flood;
R2L: unauthorized access from a remote machine, e.g. guessing password;
U2R: unauthorized access to local superuser (root) privileges;
probing: surveillance and other probing, e.g., port scanning.

The dataset contains about 5 million connection records.

There are 3 types of features:

basic features of individual connections, e.g. duration of connection
content features within a connection, e.g. number of failed log in attempts
traffic features within a 2 second window, e.g. number of connections to the same host as the current connection

This notebook requires the seaborn package for visualization which can be installed via pip:

!pip install seaborn

#| tags: []
import matplotlib
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import os
import pandas as pd
import seaborn as sns
from sklearn.metrics import confusion_matrix, f1_score

from alibi_detect.od import IForest
from alibi_detect.datasets import fetch_kdd
from alibi_detect.utils.data import create_outlier_batch
from alibi_detect.saving import save_detector, load_detector
from alibi_detect.utils.visualize import plot_instance_score, plot_roc

Load dataset

We only keep a number of continuous (18 out of 41) features.

#| tags: []
kddcup = fetch_kdd(percent10=True)  # only load 10% of the dataset
print(kddcup.data.shape, kddcup.target.shape)

Assume that a model is trained on normal instances of the dataset (not outliers) and standardization is applied:

#| tags: []
np.random.seed(0)
normal_batch = create_outlier_batch(kddcup.data, kddcup.target, n_samples=400000, perc_outlier=0)
X_train, y_train = normal_batch.data.astype('float'), normal_batch.target
print(X_train.shape, y_train.shape)
print('{}% outliers'.format(100 * y_train.mean()))

#| tags: []
mean, stdev = X_train.mean(axis=0), X_train.std(axis=0)

Apply standardization:

#| tags: []
X_train = (X_train - mean) / stdev

Define outlier detector

We train an outlier detector from scratch:

#| tags: []
filepath = 'my_path'  # change to directory where model is saved
detector_name = 'IForest'
filepath = os.path.join(filepath, detector_name)

# initialize outlier detector
od = IForest(threshold=None,  # threshold for outlier score
             n_estimators=100)

# train
od.fit(X_train)

# save the trained outlier detector
save_detector(od, filepath)

#| tags: []
np.random.seed(0)
perc_outlier = 5
threshold_batch = create_outlier_batch(kddcup.data, kddcup.target, n_samples=1000, perc_outlier=perc_outlier)
X_threshold, y_threshold = threshold_batch.data.astype('float'), threshold_batch.target
X_threshold = (X_threshold - mean) / stdev
print('{}% outliers'.format(100 * y_threshold.mean()))

#| tags: []
od.infer_threshold(X_threshold, threshold_perc=100-perc_outlier)
print('New threshold: {}'.format(od.threshold))

Let's save the outlier detector with updated threshold:

#| tags: []
save_detector(od, filepath)

Detect outliers

We now generate a batch of data with 10% outliers and detect the outliers in the batch.

#| tags: []
np.random.seed(1)
outlier_batch = create_outlier_batch(kddcup.data, kddcup.target, n_samples=1000, perc_outlier=10)
X_outlier, y_outlier = outlier_batch.data.astype('float'), outlier_batch.target
X_outlier = (X_outlier - mean) / stdev
print(X_outlier.shape, y_outlier.shape)
print('{}% outliers'.format(100 * y_outlier.mean()))

Predict outliers:

#| tags: []
od_preds = od.predict(X_outlier, return_instance_score=True)

Display results

F1 score and confusion matrix:

#| tags: []
labels = outlier_batch.target_names
y_pred = od_preds['data']['is_outlier']
f1 = f1_score(y_outlier, y_pred)
print('F1 score: {:.4f}'.format(f1))
cm = confusion_matrix(y_outlier, y_pred)
df_cm = pd.DataFrame(cm, index=labels, columns=labels)
sns.heatmap(df_cm, annot=True, cbar=True, linewidths=.5)
plt.show()

Plot instance level outlier scores vs. the outlier threshold:

#| tags: []
plot_instance_score(od_preds, y_outlier, labels, od.threshold)

We can see that the isolation forest does not do a good job at detecting 1 type of outliers with an outlier score around 0. This makes inferring a good threshold without explicit knowledge about the outliers hard. Setting the threshold just below 0 would lead to significantly better detector performance for the outliers in the dataset. This is also reflected by the ROC curve:

#| tags: []
roc_data = {'IF': {'scores': od_preds['data']['instance_score'], 'labels': y_outlier}}
plot_roc(roc_data)

Likelihood Ratio Outlier Detection on Genomic Sequences

Method

The outlier detector described by Ren et al. (2019) in Likelihood Ratios for Out-of-Distribution Detection uses the likelihood ratio between 2 generative models as the outlier score. One model is trained on the original data while the other is trained on a perturbed version of the dataset. This is based on the observation that the likelihood score for an instance under a generative model can be heavily affected by population level background statistics. The second generative model is therefore trained to capture the background statistics still present in the perturbed data while the semantic features have been erased by the perturbations.

The perturbations are added using an independent and identical Bernoulli distribution with rate $\mu$ which substitutes a feature with one of the other possible feature values with equal probability. Each feature in the genome dataset can take 4 values (one of the ACGT nucleobases). This means that a perturbed feature is swapped with one of the other nucleobases. The generative model used in the example is a simple LSTM network.

Dataset

The bacteria genomics dataset for out-of-distribution detection was released as part of the Likelihood Ratios for Out-of-Distribution Detection paper. From the original TL;DR: The dataset contains genomic sequences of 250 base pairs from 10 in-distribution bacteria classes for training, 60 OOD bacteria classes for validation, and another 60 different OOD bacteria classes for test. There are respectively 1, 7 and again 7 million sequences in the training, validation and test sets. For detailed info on the dataset check the README.

This notebook requires the seaborn package for visualization which can be installed via pip:

!pip install seaborn

#| scrolled: true
import os
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score
import tensorflow as tf
from tensorflow.keras.layers import Dense, Input, LSTM

from alibi_detect.od import LLR
from alibi_detect.datasets import fetch_genome
from alibi_detect.utils.fetching import fetch_detector
from alibi_detect.saving import save_detector, load_detector
from alibi_detect.utils.visualize import plot_roc

Load genome data

X represents the genome sequences and y whether they are outliers ($1$) or not ($0$).

(X_train, y_train), (X_val, y_val), (X_test, y_test) = \
        fetch_genome(return_X_y=True, return_labels=False)
print(X_train.shape, y_train.shape, X_val.shape, y_val.shape, X_test.shape, y_test.shape)

There are no outliers in the training set and a majority of outliers (compared to the training data) in the validation and test sets:

print('Fraction of outliers in train, val and test sets: '
      '{:.2f}, {:.2f} and {:.2f}'.format(y_train.mean(), y_val.mean(), y_test.mean()))

Define model

We need to define a generative model which models the genome sequences. We follow the paper and opt for a simple LSTM. Note that we don't actually need to define the model below if we simply load the pretrained detector later on:

genome_dim = 249  # not 250 b/c we use 1->249 as input and 2->250 as target
input_dim = 4  # ACGT nucleobases
hidden_dim = 2000

inputs = Input(shape=(genome_dim,), dtype=tf.int8)
x = tf.one_hot(tf.cast(inputs, tf.int32), input_dim)
x = LSTM(hidden_dim, return_sequences=True)(x)
logits = Dense(input_dim, activation=None)(x)
model = tf.keras.Model(inputs=inputs, outputs=logits, name='LlrLSTM')

We also need to define our loss function which we can utilize to evaluate the log-likelihood for the outlier detector:

def loss_fn(y, x):
    y = tf.one_hot(tf.cast(y, tf.int32), 4)  # ACGT on-hot encoding
    return tf.nn.softmax_cross_entropy_with_logits(y, x, axis=-1)

def likelihood_fn(y, x):
    return -loss_fn(y, x)

Load or train the outlier detector

We can again either fetch the pretrained detector from a Google Cloud Bucket or train one from scratch:

load_pretrained = True

#| scrolled: false
filepath = os.path.join(os.getcwd(), 'my_path')  # change to download directory
detector_type = 'outlier'
dataset = 'genome'
detector_name = 'LLR'
filepath = os.path.join(filepath, detector_name)
if load_pretrained:  # load pretrained outlier detector
    od = fetch_detector(filepath, detector_type, dataset, detector_name)
else:
    # initialize detector
    od = LLR(threshold=None, model=model, log_prob=likelihood_fn, sequential=True)
    
    # train
    od.fit(
        X_train,
        mutate_fn_kwargs=dict(rate=.2, feature_range=(0,3)),
        mutate_batch_size=1000,
        loss_fn=loss_fn,
        optimizer=tf.keras.optimizers.Adam(learning_rate=5e-4),
        epochs=20,
        batch_size=100,
        verbose=False
    )
    
    # save the trained outlier detector
    save_detector(od, filepath)

Compare the log likelihoods

Let's compare the log likelihoods of the inliers vs. the outlier test set data under the semantic and background models. We randomly sample $100,000$ instances from both distributions since the full test set contains $7,000,000$ genomic sequences. The histograms show that the generative model does not distinguish well between inliers and outliers.

idx_in, idx_ood = np.where(y_test == 0)[0], np.where(y_test == 1)[0]
n_in, n_ood = idx_in.shape[0], idx_ood.shape[0]
n_sample = 100000  # sample 100k inliers and outliers each
sample_in = np.random.choice(n_in, size=n_sample, replace=False)
sample_ood = np.random.choice(n_ood, size=n_sample, replace=False)
X_test_in, X_test_ood = X_test[idx_in[sample_in]], X_test[idx_ood[sample_ood]]
y_test_in, y_test_ood = y_test[idx_in[sample_in]], y_test[idx_ood[sample_ood]]
X_test_sample = np.concatenate([X_test_in, X_test_ood])
y_test_sample = np.concatenate([y_test_in, y_test_ood])
print(X_test_in.shape, X_test_ood.shape)

# semantic model
logp_s_in = od.logp_alt(od.dist_s, X_test_in, batch_size=100)
logp_s_ood = od.logp_alt(od.dist_s, X_test_ood, batch_size=100)
logp_s = np.concatenate([logp_s_in, logp_s_ood])
# background model
logp_b_in = od.logp_alt(od.dist_b, X_test_in, batch_size=100)
logp_b_ood = od.logp_alt(od.dist_b, X_test_ood, batch_size=100)
logp_b = np.concatenate([logp_b_in, logp_b_ood])

# show histograms
plt.hist(logp_s_in, bins=100, label='in');
plt.hist(logp_s_ood, bins=100, label='ood');
plt.title('Semantic Log Probabilities')
plt.legend()
plt.show()

plt.hist(logp_b_in, bins=100, label='in');
plt.hist(logp_b_ood, bins=100, label='ood');
plt.title('Background Log Probabilities')
plt.legend()
plt.show()

This is because of the background-effect which is in this case the GC-content in the genomic sequences. This effect is partially reduced when taking the likelihood ratio:

llr_in = logp_s_in - logp_b_in
llr_ood = logp_s_ood - logp_b_ood

plt.hist(llr_in, bins=100, label='in');
plt.hist(llr_ood, bins=100, label='ood');
plt.title('Likelihood Ratio')
plt.legend()
plt.show()

llr = np.concatenate([llr_in, llr_ood])
roc_data = {'LLR': {'scores': -llr, 'labels': y_test_sample}}
plot_roc(roc_data)

Detect outliers

We follow the same procedure with the outlier detector. First we need to set an outlier threshold with infer_threshold. We need to pass a batch of instances and specify what percentage of those we consider to be normal via threshold_perc. Let's assume we have a small batch of data with roughly $30$% outliers but we don't know exactly which ones.

n, frac_outlier = 1000, .3
perc_outlier = 100 * frac_outlier
n_sample_in, n_sample_ood = int(n * (1 - frac_outlier)), int(n * frac_outlier)
idx_in, idx_ood = np.where(y_val == 0)[0], np.where(y_val == 1)[0]
n_in, n_ood = idx_in.shape[0], idx_ood.shape[0]
sample_in = np.random.choice(n_in, size=n_sample_in, replace=False)
sample_ood = np.random.choice(n_ood, size=n_sample_ood, replace=False)
X_thr_in, X_thr_ood = X_val[idx_in[sample_in]], X_val[idx_ood[sample_ood]]
X_threshold = np.concatenate([X_thr_in, X_thr_ood])
print(X_threshold.shape)

od.infer_threshold(X_threshold, threshold_perc=perc_outlier, batch_size=100)
print('New threshold: {}'.format(od.threshold))

Let's save the outlier detector with updated threshold:

save_detector(od, filepath)

Let'spredict outliers on a sample of the test set:

od_preds = od.predict(X_test_sample, batch_size=100)

Display results

F1 score, accuracy, precision, recall and confusion matrix:

y_pred = od_preds['data']['is_outlier']
labels = ['normal', 'outlier']
f1 = f1_score(y_test_sample, y_pred)
acc = accuracy_score(y_test_sample, y_pred)
prec = precision_score(y_test_sample, y_pred)
rec = recall_score(y_test_sample, y_pred)
print('F1 score: {:.3f} -- Accuracy: {:.3f} -- Precision: {:.3f} '
      '-- Recall: {:.3f}'.format(f1, acc, prec, rec))
cm = confusion_matrix(y_test_sample, y_pred)
df_cm = pd.DataFrame(cm, index=labels, columns=labels)
sns.heatmap(df_cm, annot=True, cbar=True, linewidths=.5)
plt.show()

We can also plot the ROC curve based on the instance level outlier scores:

roc_data = {'LLR': {'scores': od_preds['data']['instance_score'], 'labels': y_test_sample}}
plot_roc(roc_data)

Likelihood Ratio Outlier Detection with PixelCNN++

Method

The generative model used in the example is a PixelCNN++, adapted from the official TensorFlow Probability implementation, and available as a standalone model in from alibi_detect.models.tensorflow import PixelCNN.

Dataset

The training set Fashion-MNIST consists of 60,000 28 by 28 grayscale images distributed over 10 classes. The classes represent items of clothing such as shirts or trousers. At test time, we want to distinguish the Fashion-MNIST test set from MNIST, which represents 28 by 28 grayscale numbers from 0 to 9.

This notebook requires the seaborn package for visualization which can be installed via pip:

!pip install seaborn

import os
from functools import partial
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score
import tensorflow as tf

from alibi_detect.od import LLR
from alibi_detect.models.tensorflow import PixelCNN
from alibi_detect.utils.fetching import fetch_detector
from alibi_detect.saving import save_detector, load_detector
from alibi_detect.utils.tensorflow import predict_batch
from alibi_detect.utils.visualize import plot_roc

Utility Functions

def load_data(dataset: str) -> tuple:
    if dataset == 'mnist':
        (X_train, y_train), (X_test, y_test) = tf.keras.datasets.mnist.load_data()
    elif dataset == 'fashion_mnist':
        (X_train, y_train), (X_test, y_test) = tf.keras.datasets.fashion_mnist.load_data()
    else:
        raise NotImplementedError
    X_train = X_train.astype('float32')
    X_test = X_test.astype('float32')
    y_train = y_train.astype('int64').reshape(-1,)
    y_test = y_test.astype('int64').reshape(-1,)
    if len(X_train.shape) == 3:
        shape = (-1,) + X_train.shape[1:] + (1,)
        X_train = X_train.reshape(shape)
        X_test = X_test.reshape(shape)
    return (X_train, y_train), (X_test, y_test)


def plot_grid_img(X: np.ndarray, figsize: tuple = (10, 6)) -> None:
    n = X.shape[0]
    nrows = int(n**.5)
    ncols = int(np.ceil(n / nrows))
    fig, ax = plt.subplots(nrows=nrows, ncols=ncols, figsize=figsize)
    n_subplot = 1
    for r in range(nrows):
        for c in range(ncols):
            plt.subplot(nrows, ncols, n_subplot)
            plt.axis('off')
            plt.imshow(X[n_subplot-1, :, :, 0])
            n_subplot += 1
            

def plot_grid_logp(idx: list, X: np.ndarray, logp_s: np.ndarray, 
                   logp_b: np.ndarray, figsize: tuple = (10, 6)) -> None:
    nrows, ncols = len(idx), 4
    fig, ax = plt.subplots(nrows=nrows, ncols=ncols, figsize=figsize)
    n_subplot = 1
    for r in range(nrows):
        plt.subplot(nrows, ncols, n_subplot)
        plt.imshow(X[idx[r], :, :, 0])
        plt.colorbar()
        plt.axis('off')
        if r == 0:
            plt.title('Image')
        n_subplot += 1

        plt.subplot(nrows, ncols, n_subplot)
        plt.imshow(logp_s[idx[r], :, :])
        plt.colorbar()
        plt.axis('off')
        if r == 0:
            plt.title('Semantic Logp')
        n_subplot += 1

        plt.subplot(nrows, ncols, n_subplot)
        plt.imshow(logp_b[idx[r], :, :])
        plt.colorbar()
        plt.axis('off')
        if r == 0:
            plt.title('Background Logp')
        n_subplot += 1

        plt.subplot(nrows, ncols, n_subplot)
        plt.imshow(logp_s[idx[r], :, :] - logp_b[idx[r], :, :])
        plt.colorbar()
        plt.axis('off')
        if r == 0:
            plt.title('LLR')
        n_subplot += 1

Load data

The in-distribution dataset is Fashion-MNIST and the out-of-distribution dataset we'd like to detect is MNIST.

(X_train_in, y_train_in), (X_test_in, y_test_in) = load_data('fashion_mnist')
X_test_ood, y_test_ood = load_data('mnist')[1]
input_shape = X_train_in.shape[1:]
print(X_train_in.shape, X_test_in.shape, X_test_ood.shape)

i = 0
plt.imshow(X_train_in[i].reshape(input_shape[:-1]))
plt.title('Fashion-MNIST')
plt.axis('off')
plt.show();
plt.imshow(X_test_ood[i].reshape(input_shape[:-1]))
plt.title('MNIST')
plt.axis('off')
plt.show();

Define PixelCNN++ model

We now need to define our generative model. This is not necessary if the pretrained detector is later loaded from the Google Bucket.

Key PixelCNN++ arguments in a nutshell:

num_resnet: number of layers (Fig.2 PixelCNN) within each hierarchical block (Fig.2 PixelCNN++).
num_hierarchies: number of blocks separated by expansions or contractions of dimensions. See Fig.2 PixelCNN++.
num_filters: number of convolutional filters.
num_logistic_mix: number of components in the logistic mixture distribution.
receptive_field_dims: height and width in pixels of the receptive field above and to the left of a given pixel.

Optionally, a different model can be passed to the detector with argument model_background. The Likelihood Ratio paper mentions that additional $L2$-regularization (l2_weight) for the background model could improve detection performance.

model = PixelCNN(
    image_shape=input_shape,
    num_resnet=5,
    num_hierarchies=2,
    num_filters=32,
    num_logistic_mix=1,
    receptive_field_dims=(3, 3),
    dropout_p=.3,
    l2_weight=0.
)

Load or train the outlier detector

We can again either fetch the pretrained detector from a Google Cloud Bucket or train one from scratch:

load_pretrained = True

filepath = os.path.join(os.getcwd(), 'my_path')  # change to download directory
detector_type = 'outlier'
dataset = 'fashion_mnist'
detector_name = 'LLR'
filepath = os.path.join(filepath, detector_name)    
if load_pretrained:  # load pretrained outlier detector
    od = fetch_detector(filepath, detector_type, dataset, detector_name)
else:
    # initialize detector
    od = LLR(threshold=None, model=model)
    
    # train
    od.fit(
        X_train_in,
        mutate_fn_kwargs=dict(rate=.2),
        mutate_batch_size=1000,
        optimizer=tf.keras.optimizers.Adam(learning_rate=1e-4),
        epochs=20,
        batch_size=32,
        verbose=False
    )
    
    # save the trained outlier detector
    save_detector(od, filepath)

We can load our saved detector again by defining the PixelCNN architectures for the semantic and background models as well as providing the shape of the input data:

kwargs = {'dist_s': model, 'dist_b': model.copy(), 'input_shape': input_shape}
od = load_detector(filepath, **kwargs)

Let's sample some instances from the semantic model to check how good our generative model is:

n_sample = 16
X_sample = od.dist_s.sample(n_sample).numpy()

plot_grid_img(X_sample)

Most of the instances look like they represent the dataset well. When we do the same thing for our background model, we see that there is some background noise injected:

X_sample = od.dist_b.sample(n_sample).numpy()

plot_grid_img(X_sample)

Compare the log likelihoods

Let's compare the log likelihoods of the inliers vs. the outlier data under the semantic and background models. Although MNIST data looks very distinct from Fashion-MNIST, the generative model does not distinguish well between the 2 datasets as shown by the histograms of the log likelihoods:

shape_in, shape_ood = X_test_in.shape[0], X_test_ood.shape[0]

# semantic model
logp_s_in = predict_batch(X_test_in, od.dist_s.log_prob, batch_size=32, shape=shape_in)
logp_s_ood = predict_batch(X_test_ood, od.dist_s.log_prob, batch_size=32, shape=shape_ood)
logp_s = np.concatenate([logp_s_in, logp_s_ood])
# background model
logp_b_in = predict_batch(X_test_in, od.dist_b.log_prob, batch_size=32, shape=shape_in)
logp_b_ood = predict_batch(X_test_ood, od.dist_b.log_prob, batch_size=32, shape=shape_ood)

# show histograms
plt.hist(logp_s_in, bins=100, label='in');
plt.hist(logp_s_ood, bins=100, label='ood');
plt.title('Semantic Log Probabilities')
plt.legend()
plt.show()

plt.hist(logp_b_in, bins=100, label='in');
plt.hist(logp_b_ood, bins=100, label='ood');
plt.title('Background Log Probabilities')
plt.legend()
plt.show()

This is due to the dominance of the background which is similar (basically lots of $0$'s for both datasets). If we however take the likelihood ratio, the MNIST data are detected as outliers. And this is exactly what the outlier detector does as well:

llr_in = logp_s_in - logp_b_in
llr_ood = logp_s_ood - logp_b_ood

plt.hist(llr_in, bins=100, label='in');
plt.hist(llr_ood, bins=100, label='ood');
plt.title('Likelihood Ratio')
plt.legend()
plt.show()

Detect outliers

We follow the same procedure with the outlier detector. First we need to set an outlier threshold with infer_threshold. We need to pass a batch of instances and specify what percentage of those we consider to be normal via threshold_perc. Let's assume we have a small batch of data with roughly $50$% outliers but we don't know exactly which ones.

n, frac_outlier = 500, .5
perc_outlier = 100 * frac_outlier
n_in, n_ood = int(n * (1 - frac_outlier)), int(n * frac_outlier)
idx_in = np.random.choice(shape_in, size=n_in, replace=False)
idx_ood = np.random.choice(shape_ood, size=n_ood, replace=False)
X_threshold = np.concatenate([X_test_in[idx_in], X_test_ood[idx_ood]])

#| scrolled: false
od.infer_threshold(X_threshold, threshold_perc=perc_outlier, batch_size=32)
print('New threshold: {}'.format(od.threshold))

Let's save the outlier detector with updated threshold:

save_detector(od, filepath)

Let's now predict outliers on the combined Fashion-MNIST and MNIST datasets:

X_test = np.concatenate([X_test_in, X_test_ood])
y_test = np.concatenate([np.zeros(X_test_in.shape[0]), np.ones(X_test_ood.shape[0])])
print(X_test.shape, y_test.shape)

od_preds = od.predict(X_test,
                      batch_size=32,
                      outlier_type='instance',    # use 'feature' or 'instance' level
                      return_feature_score=True,  # scores used to determine outliers
                      return_instance_score=True)

Display results

F1 score, accuracy, precision, recall and confusion matrix:

y_pred = od_preds['data']['is_outlier']
labels = ['normal', 'outlier']
f1 = f1_score(y_test, y_pred)
acc = accuracy_score(y_test, y_pred)
prec = precision_score(y_test, y_pred)
rec = recall_score(y_test, y_pred)
print('F1 score: {:.3f} -- Accuracy: {:.3f} -- Precision: {:.3f} '
      '-- Recall: {:.3f}'.format(f1, acc, prec, rec))
cm = confusion_matrix(y_test, y_pred)
df_cm = pd.DataFrame(cm, index=labels, columns=labels)
sns.heatmap(df_cm, annot=True, cbar=True, linewidths=.5)
plt.show()

We can also plot the ROC curve based on the instance level outlier scores and compare it with the likelihood of only the semantic model:

roc_data = {
    'LLR': {'scores': od_preds['data']['instance_score'], 'labels': y_test},
    'Likelihood': {'scores': -logp_s, 'labels': y_test}  # negative b/c outlier score
}
plot_roc(roc_data)

Analyse feature scores

To understand why the likelihood ratio works to detect outliers but the raw log likelihoods don't, it is helpful to look at the pixel-wise log likelihoods of both the semantic and background models.

n_plot = 5

# semantic model
logp_fn_s = partial(od.dist_s.log_prob, return_per_feature=True)
logp_s_pixel_in = predict_batch(X_test_in[:n_plot], logp_fn_s, batch_size=32)
logp_s_pixel_ood = predict_batch(X_test_ood[:n_plot], logp_fn_s, batch_size=32)

# background model
logp_fn_b = partial(od.dist_b.log_prob, return_per_feature=True)
logp_b_pixel_in = predict_batch(X_test_in[:n_plot], logp_fn_b, batch_size=32)
logp_b_pixel_ood = predict_batch(X_test_ood[:n_plot], logp_fn_b, batch_size=32)

# pixel-wise likelihood ratios
llr_pixel_in = logp_s_pixel_in - logp_b_pixel_in
llr_pixel_ood = logp_s_pixel_ood - logp_b_pixel_ood

Plot in-distribution instances:

idx = list(np.arange(n_plot))
plot_grid_logp(idx, X_test_in, logp_s_pixel_in, logp_b_pixel_in, figsize=(14,14))

It is clear that both the semantic and background model attach high probabilities to the background pixels. This effect is cancelled out in the likelihood ratio in the last column. The same applies to the out-of-distribution instances:

idx = list(np.arange(n_plot))
plot_grid_logp(idx, X_test_ood, logp_s_pixel_ood, logp_b_pixel_ood, figsize=(14,14))

Mahalanobis outlier detection on KDD Cup ‘99 dataset

Method

Dataset

There are 4 types of attacks in the dataset:

DOS: denial-of-service, e.g. syn flood;
R2L: unauthorized access from a remote machine, e.g. guessing password;
U2R: unauthorized access to local superuser (root) privileges;
probing: surveillance and other probing, e.g., port scanning.

The dataset contains about 5 million connection records.

There are 3 types of features:

basic features of individual connections, e.g. duration of connection
content features within a connection, e.g. number of failed log in attempts
traffic features within a 2 second window, e.g. number of connections to the same host as the current connection

This notebook requires the seaborn package for visualization which can be installed via pip:

!pip install seaborn

#| scrolled: true
#| tags: []
import matplotlib
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import os
import pandas as pd
import seaborn as sns
from sklearn.metrics import confusion_matrix, f1_score
from sklearn.preprocessing import OneHotEncoder, OrdinalEncoder

from alibi_detect.od import Mahalanobis
from alibi_detect.datasets import fetch_kdd
from alibi_detect.utils.data import create_outlier_batch
from alibi_detect.utils.fetching import fetch_detector
from alibi_detect.utils.mapping import ord2ohe
from alibi_detect.saving import save_detector, load_detector
from alibi_detect.utils.visualize import plot_instance_score, plot_roc

Load dataset

We only keep a number of continuous (18 out of 41) features.

#| tags: []
kddcup = fetch_kdd(percent10=True)  # only load 10% of the dataset
print(kddcup.data.shape, kddcup.target.shape)

Assume that a machine learning model is trained on normal instances of the dataset (not outliers) and standardization is applied:

#| tags: []
np.random.seed(0)
normal_batch = create_outlier_batch(kddcup.data, kddcup.target, n_samples=100000, perc_outlier=0)
X_train, y_train = normal_batch.data.astype('float'), normal_batch.target
print(X_train.shape, y_train.shape)
print('{}% outliers'.format(100 * y_train.mean()))

#| tags: []
mean, stdev = X_train.mean(axis=0), X_train.std(axis=0)

Define outlier detector

We train an outlier detector from scratch.

Be aware that Mahalanobis is an online, stateful outlier detector. Saving or loading a Mahalanobis detector therefore also saves and loads the state of the detector. This allows the user to warm up the detector before deploying it into production.

#| tags: []
filepath = 'my_path'  # change to directory where model is saved
detector_name = 'Mahalanobis'
filepath = os.path.join(filepath, detector_name)

# initialize and save outlier detector
threshold = None  # scores above threshold are classified as outliers   
n_components = 2  # nb of components used in PCA
std_clip = 3  # clip values used to compute mean and cov above "std_clip" standard deviations
start_clip = 20  # start clipping values after "start_clip" instances

od = Mahalanobis(threshold, 
                 n_components=n_components,
                 std_clip=std_clip,
                 start_clip=start_clip)

save_detector(od, filepath)  # save outlier detector

#| tags: []
np.random.seed(0)
perc_outlier = 5
threshold_batch = create_outlier_batch(kddcup.data, kddcup.target, n_samples=1000, perc_outlier=perc_outlier)
X_threshold, y_threshold = threshold_batch.data.astype('float'), threshold_batch.target
X_threshold = (X_threshold - mean) / stdev
print('{}% outliers'.format(100 * y_threshold.mean()))

#| tags: []
od.infer_threshold(X_threshold, threshold_perc=100-perc_outlier)
print('New threshold: {}'.format(od.threshold))
threshold = od.threshold

Detect outliers

We now generate a batch of data with 10% outliers, standardize those with the mean and stdev values obtained from the normal data (inliers) and detect the outliers in the batch.

#| tags: []
np.random.seed(1)
outlier_batch = create_outlier_batch(kddcup.data, kddcup.target, n_samples=1000, perc_outlier=10)
X_outlier, y_outlier = outlier_batch.data.astype('float'), outlier_batch.target
X_outlier = (X_outlier - mean) / stdev
print(X_outlier.shape, y_outlier.shape)
print('{}% outliers'.format(100 * y_outlier.mean()))

Predict outliers:

#| tags: []
od_preds = od.predict(X_outlier, return_instance_score=True)

We can now save the warmed up outlier detector:

#| tags: []
save_detector(od, filepath)

Display results

F1 score and confusion matrix:

#| tags: []
labels = outlier_batch.target_names
y_pred = od_preds['data']['is_outlier']
f1 = f1_score(y_outlier, y_pred)
print('F1 score: {}'.format(f1))
cm = confusion_matrix(y_outlier, y_pred)
df_cm = pd.DataFrame(cm, index=labels, columns=labels)
sns.heatmap(df_cm, annot=True, cbar=True, linewidths=.5)
plt.show()

Plot instance level outlier scores vs. the outlier threshold:

#| tags: []
plot_instance_score(od_preds, y_outlier, labels, od.threshold, ylim=(0,50))

We can also plot the ROC curve for the outlier scores of the detector:

#| tags: []
roc_data = {'MD': {'scores': od_preds['data']['instance_score'], 'labels': y_outlier}}
plot_roc(roc_data)

Include categorical variables

So far we only tracked continuous variables. We can however also include categorical variables. The fit step first computes pairwise distances between the categories of each categorical variable. The pairwise distances are based on either the model predictions (MVDM method) or the context provided by the other variables in the dataset (ABDM method). For MVDM, we use the difference between the conditional model prediction probabilities of each category. This method is based on the Modified Value Difference Metric (MVDM) by Cost et al (1993). ABDM stands for Association-Based Distance Metric, a categorical distance measure introduced by Le et al (2005). ABDM infers context from the presence of other variables in the data and computes a dissimilarity measure based on the Kullback-Leibler divergence. Both methods can also be combined as ABDM-MVDM. We can then apply multidimensional scaling to project the pairwise distances into Euclidean space.

Load and transform data

#| tags: []
cat_cols = ['protocol_type', 'service', 'flag']
num_cols = ['srv_count', 'serror_rate', 'srv_serror_rate',
            'rerror_rate', 'srv_rerror_rate', 'same_srv_rate', 
            'diff_srv_rate', 'srv_diff_host_rate', 'dst_host_count', 
            'dst_host_srv_count', 'dst_host_same_srv_rate', 
            'dst_host_diff_srv_rate', 'dst_host_same_src_port_rate',
            'dst_host_srv_diff_host_rate', 'dst_host_serror_rate', 
            'dst_host_srv_serror_rate', 'dst_host_rerror_rate', 
            'dst_host_srv_rerror_rate']
cols = cat_cols + num_cols

#| tags: []
np.random.seed(0)
kddcup = fetch_kdd(keep_cols=cols, percent10=True)
print(kddcup.data.shape, kddcup.target.shape)

Create a dictionary with as keys the categorical columns and values the number of categories for each variable in the dataset. This dictionary will later be used in the fit step of the outlier detector.

#| tags: []
cat_vars_ord = {}
n_categories = len(cat_cols)
for i in range(n_categories):
    cat_vars_ord[i] = len(np.unique(kddcup.data[:, i]))
print(cat_vars_ord)

Fit an ordinal encoder on the categorical data:

#| tags: []
enc = OrdinalEncoder()
enc.fit(kddcup.data[:, :n_categories])

Combine scaled numerical and ordinal features. X_fit will be used to infer distances between categorical features later. To make it easy, we will already transform the whole dataset, including the outliers that need to be detected later. This is for illustrative purposes:

#| tags: []
X_num = (kddcup.data[:, n_categories:] - mean) / stdev  # standardize numerical features
X_ord = enc.transform(kddcup.data[:, :n_categories])  # apply ordinal encoding to categorical features
X_fit = np.c_[X_ord, X_num].astype(np.float32, copy=False)  # combine numerical and categorical features
print(X_fit.shape)

Initialize and fit outlier detector

We use the same threshold as for the continuous data. This will likely not result in optimal performance. Alternatively, you can infer the threshold again.

#| tags: []
n_components = 2
std_clip = 3
start_clip = 20
    
od = Mahalanobis(threshold,
                 n_components=n_components, 
                 std_clip=std_clip, 
                 start_clip=start_clip,
                 cat_vars=cat_vars_ord,
                 ohe=False)  # True if one-hot encoding (OHE) is used

Set fit parameters:

#| tags: []
d_type = 'abdm'  # pairwise distance type, 'abdm' infers context from other variables
disc_perc = [25, 50, 75]  # percentiles used to bin numerical values; used in 'abdm' calculations
standardize_cat_vars = True  # standardize numerical values of categorical variables

Apply fit method to find numerical values for categorical variables:

#| tags: []
od.fit(X_fit,
       d_type=d_type,
       disc_perc=disc_perc,
       standardize_cat_vars=standardize_cat_vars)

The numerical values for the categorical features are stored in the attribute od.d_abs. This is a dictionary with as keys the columns for the categorical features and as values the numerical equivalent of the category:

#| tags: []
cat = 0  # categorical variable to plot numerical values for

#| tags: []
plt.bar(np.arange(len(od.d_abs[cat])), od.d_abs[cat])
plt.xticks(np.arange(len(od.d_abs[cat])))
plt.title('Numerical values for categories in categorical variable {}'.format(cat))
plt.xlabel('Category')
plt.ylabel('Numerical value')
plt.show()

Another option would be to set d_type to 'mvdm' and y to kddcup.target to infer the numerical values for categorical variables from the model labels (or alternatively the predictions).

Run outlier detector and display results

Generate batch of data with 10% outliers:

#| tags: []
np.random.seed(1)
outlier_batch = create_outlier_batch(kddcup.data, kddcup.target, n_samples=1000, perc_outlier=10)
data, y_outlier = outlier_batch.data, outlier_batch.target
print(data.shape, y_outlier.shape)
print('{}% outliers'.format(100 * y_outlier.mean()))

Preprocess the outlier batch:

#| tags: []
X_num = (data[:, n_categories:] - mean) / stdev
X_ord = enc.transform(data[:, :n_categories])
X_outlier = np.c_[X_ord, X_num].astype(np.float32, copy=False)
print(X_outlier.shape)

Predict outliers:

#| tags: []
od_preds = od.predict(X_outlier, return_instance_score=True)

F1 score and confusion matrix:

#| tags: []
y_pred = od_preds['data']['is_outlier']
f1 = f1_score(y_outlier, y_pred)
print('F1 score: {}'.format(f1))
cm = confusion_matrix(y_outlier, y_pred)
df_cm = pd.DataFrame(cm, index=labels, columns=labels)
sns.heatmap(df_cm, annot=True, cbar=True, linewidths=.5)
plt.show()

Plot instance level outlier scores vs. the outlier threshold:

#| tags: []
plot_instance_score(od_preds, y_outlier, labels, od.threshold, ylim=(0, 150))

Use OHE instead of ordinal encoding for the categorical variables

Since we will apply one-hot encoding (OHE) on the categorical variables, we convert cat_vars_ord from the ordinal to OHE format. alibi_detect.utils.mapping contains utility functions to do this. The keys in cat_vars_ohe now represent the first column index for each one-hot encoded categorical variable. This dictionary will later be used in the counterfactual explanation.

#| tags: []
cat_vars_ohe = ord2ohe(X_fit, cat_vars_ord)[1]
print(cat_vars_ohe)

Fit a one-hot encoder on the categorical data:

#| tags: []
enc = OneHotEncoder(categories='auto')
enc.fit(X_fit[:, :n_categories])

Transform X_fit to OHE:

#| tags: []
X_ohe = enc.transform(X_fit[:, :n_categories])
X_fit = np.array(np.c_[X_ohe.todense(), X_fit[:, n_categories:]].astype(np.float32, copy=False))
print(X_fit.shape)

Initialize and fit outlier detector

Initialize:

#| tags: []
od = Mahalanobis(threshold,
                 n_components=n_components, 
                 std_clip=std_clip, 
                 start_clip=start_clip,
                 cat_vars=cat_vars_ohe,
                 ohe=True)

Apply fit method:

#| tags: []
od.fit(X_fit,
       d_type=d_type,
       disc_perc=disc_perc,
       standardize_cat_vars=standardize_cat_vars)

Run outlier detector and display results

Transform outlier batch to OHE:

#| tags: []
X_ohe = enc.transform(X_ord)
X_outlier = np.array(np.c_[X_ohe.todense(), X_num].astype(np.float32, copy=False))
print(X_outlier.shape)

Predict outliers:

#| tags: []
od_preds = od.predict(X_outlier, return_instance_score=True)

F1 score and confusion matrix:

#| tags: []
y_pred = od_preds['data']['is_outlier']
f1 = f1_score(y_outlier, y_pred)
print('F1 score: {}'.format(f1))
cm = confusion_matrix(y_outlier, y_pred)
df_cm = pd.DataFrame(cm, index=labels, columns=labels)
sns.heatmap(df_cm, annot=True, cbar=True, linewidths=.5)
plt.show()

Plot instance level outlier scores vs. the outlier threshold:

#| tags: []
plot_instance_score(od_preds, y_outlier, labels, od.threshold, ylim=(0,200))

Time-series outlier detection using Prophet on weather data

Method

Note

To use this detector, first install Prophet by running:

Dataset

The example uses a weather time series dataset recorded by the . The dataset contains 14 different features such as air temperature, atmospheric pressure, and humidity. These were collected every 10 minutes, beginning in 2003. Like the , we only use data collected between 2009 and 2016.

Load dataset

Select subset to test Prophet model on:

Prophet model expects a DataFrame with 2 columns: one named ds with the timestamps and one named y with the time series to be evaluated. We will just look at the temperature data:

Define outlier detector

We train an outlier detector from scratch:

Predict outliers on test data

Define the test data. It is important that the timestamps of the test data follow the training data. We check this below by comparing the first few rows of the test DataFrame with the last few of the training DataFrame:

Predict outliers on test data:

Visualize results

We can first visualize our predictions with Prophet's built in plotting functionality. This also allows us to include historical predictions:

It is clear that the further we predict in the future, the wider the uncertainty intervals which determine the outlier threshold.

Let's overlay the actual data with the upper and lower outlier thresholds predictions and check where we predicted outliers:

Outlier scores and predictions:

The outlier scores naturally trend down as uncertainty increases when we predict further in the future.

Let's look at some individual outliers:

Seq2Seq time series outlier detection on ECG data

Method

Dataset

The outlier detector needs to spot anomalies in electrocardiograms (ECG's). The dataset contains 5000 ECG's, originally obtained from under the name BIDMC Congestive Heart Failure Database(chfdb), record chf07. The data has been pre-processed in 2 steps: first each heartbeat is extracted, and then each beat is made equal length via interpolation. The data is labeled and contains 5 classes. The first class which contains almost 60% of the observations is seen as normal while the others are outliers. The detector is trained on heartbeats from the first class and needs to flag the other classes as anomalies.

This notebook requires the seaborn package for visualization which can be installed via pip:

Load dataset

Flip train and test data because there are only 500 ECG's in the original training set and 4500 in the test set:

Since we treat the first class as the normal, inlier data and the rest of X_train as outliers, we need to adjust the training (inlier) data and the labels of the test set.

Some of the outliers in X_train are used in combination with some of the inlier instances to infer the threshold level:

Apply min-max scaling between 0 and 1 to the observations using the inlier data:

Reshape the observations to (batch size, sequence length, features) for the detector:

We can now visualize scaled instances from each class:

Load or define Seq2Seq outlier detector

Let's inspect how well the sequence-to-sequence model can predict the ECG's of the inlier and outlier classes. The predictions in the charts below are made on ECG's from the test set:

It is clear that the model can reconstruct the inlier class but struggles with the outliers.

If we trained a model from scratch, the warning thrown when we initialized the model tells us that we need to set the outlier threshold. This can be done with the infer_threshold method. We need to pass a time series of instances and specify what percentage of those we consider to be normal via threshold_perc, equal to the percentage of Class 1 in X_threshold. The outlier_perc parameter defines the percentage of features used to define the outlier threshold. In this example, the number of features considered per instance equals 140 (1 for each timestep). We set the outlier_perc at 95, which means that we will use the 95% features with highest reconstruction error, adjusted for by the threshold estimate.

Let's save the outlier detector with the updated threshold:

We can load the same detector via load_detector:

Detect outliers

Display results

F1 score, accuracy, recall and confusion matrix:

We can also plot the ROC curve based on the instance level outlier scores:

Time series outlier detection with Seq2Seq models on synthetic data

Method

Dataset

We test the outlier detector on a synthetic dataset generated with the package. It allows you to generate a wide range of time series (e.g. pseudo-periodic, autoregressive or Gaussian Process generated signals) and noise types (white or red noise). It can be installed as follows:

Additionally, this notebook requires the seaborn package for visualization which can be installed via pip:

Create multivariate time series

Visualize:

Load or define Seq2Seq outlier detector

We still need to set the outlier threshold. This can be done with the infer_threshold method. We need to pass a time series of instances and specify what percentage of those we consider to be normal via threshold_perc. First we create outliers by injecting noise in the time series via inject_outlier_ts. The noise can be regulated via the percentage of outliers (perc_outlier), the strength of the perturbation (n_std) and the minimum size of the noise perturbation (min_std). Let's assume we have some data which we know contains around 10% outliers in either of the features:

Visualize outlier data used to determine the threshold:

Let's infer the threshold. The inject_outlier_ts method distributes perturbations evenly across features. As a result, each feature contains about 5% outliers. We can either set the threshold over both features combined or determine a feature-wise threshold. Here we opt for the feature-wise threshold. This is for instance useful when different features have different variance or sensitivity to outliers. We also manually decrease the threshold a bit to increase the sensitivity of our detector:

Let's save the outlier detector with the updated threshold:

We can load the same detector via load_detector:

Detect outliers

Generate the outliers to detect:

Predict outliers:

Display results

F1 score, accuracy, recall and confusion matrix:

Plot the feature-wise outlier scores of the time series for each timestep vs. the outlier threshold:

We can also plot the ROC curve using the instance level outlier scores:

Time series outlier detection with Spectral Residuals on synthetic data

Method

The Spectral Residual outlier detector is based on the paper and is suitable for unsupervised online anomaly detection in univariate time series data. The algorithm first computes the of the original data. Then it computes the spectral residual of the log amplitude of the transformed signal before applying the Inverse Fourier Transform to map the sequence back from the frequency to the time domain. This sequence is called the saliency map. The anomaly score is then computed as the relative difference between the saliency map values and their moving averages. If this score is above a threshold, the value at a specific timestep is flagged as an outlier. For more details, please check out the .

Dataset

Additionally, this notebook requires the seaborn package for visualization which can be installed via pip:

Create univariate time series

We can inject noise in the time series via inject_outlier_ts. The noise can be regulated via the percentage of outliers (perc_outlier), the strength of the perturbation (n_std) and the minimum size of the noise perturbation (min_std):

Visualize part of the original and perturbed time series:

Perturbed data:

Define Spectral Residual outlier detector

Note that for the local convolution we pad the signal internally only on the left, following the paper's recommendation.

The warning tells us that we need to set the outlier threshold. This can be done with the infer_threshold method. We need to pass a batch of instances and specify what percentage of those we consider to be normal via threshold_perc. Let's assume we have some data which we know contains around 10% outliers:

Let's infer the threshold:

Let's save the outlier detector with the updated threshold:

We can load the same detector via load_detector:

Detect outliers

Predict outliers:

Display results

F1 score, accuracy, recall and confusion matrix:

Plot the outlier scores of the time series vs. the outlier threshold. :

Let's zoom in on a smaller time scale to have a clear picture:

VAE outlier detection for income prediction

Method

The Variational Auto-Encoder (VAE) outlier detector is first trained on a batch of unlabeled, but normal (inlier) data. Unsupervised training is desireable since labeled data is often scarce. The VAE detector tries to reconstruct the input it receives. If the input data cannot be reconstructed well, the reconstruction error is high and the data can be flagged as an outlier. The reconstruction error is measured as the mean squared error (MSE) between the input and the reconstructed instance.

Dataset

The instances contain a person's characteristics like age, marital status or education while the label represents whether the person makes more or less than $50k per year. The dataset consists of a mixture of numerical and categorical features. It is originally not an outlier detection dataset so we will inject artificial outliers. It is fetched using the Alibi library, which can be installed with pip. We also use seaborn to visualize the data:

!pip install alibi seaborn

import os
import alibi
import matplotlib
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
from sklearn.metrics import accuracy_score, confusion_matrix, f1_score, precision_score, recall_score
from sklearn.preprocessing import OneHotEncoder
import tensorflow as tf
tf.keras.backend.clear_session()
from tensorflow.keras.layers import Dense, InputLayer

from alibi_detect.od import OutlierVAE
from alibi_detect.utils.perturbation import inject_outlier_tabular
from alibi_detect.utils.fetching import fetch_detector
from alibi_detect.saving import save_detector, load_detector
from alibi_detect.utils.visualize import plot_instance_score

def set_seed(s=0):
    np.random.seed(s)
    tf.random.set_seed(s)

Load adult dataset

The fetch_adult function returns a Bunch object containing the features, the targets, the feature names and a mapping of the categories in each categorical variable.

adult = alibi.datasets.fetch_adult()
X, y = adult.data, adult.target
feature_names = adult.feature_names
category_map_tmp = adult.category_map

Shuffle data:

set_seed(0)
Xy_perm = np.random.permutation(np.c_[X, y])
X, y = Xy_perm[:,:-1], Xy_perm[:,-1]

Reorganize data so categorical features come first, remove some features and adjust feature_names and category_map accordingly:

keep_cols = [2, 3, 5, 0, 8, 9, 10]
feature_names = feature_names[2:4] + feature_names[5:6] + feature_names[0:1] + feature_names[8:11]
print(feature_names)

X = X[:, keep_cols]
print(X.shape)

category_map = {}
i = 0
for k, v in category_map_tmp.items():
    if k in keep_cols:
        category_map[i] = v
        i += 1

Preprocess data

Normalize numerical features or scale numerical between -1 and 1:

minmax = False

X_num = X[:, -4:].astype(np.float32, copy=False)
if minmax:
    xmin, xmax = X_num.min(axis=0), X_num.max(axis=0)
    rng = (-1., 1.)
    X_num_scaled = (X_num - xmin) / (xmax - xmin) * (rng[1] - rng[0]) + rng[0]
else:  # normalize
    mu, sigma = X_num.mean(axis=0), X_num.std(axis=0)
    X_num_scaled = (X_num - mu) / sigma

Fit OHE to categorical variables:

X_cat = X[:, :-4].copy()
ohe = OneHotEncoder(categories='auto')
ohe.fit(X_cat)

Combine numerical and categorical data:

X = np.c_[X_cat, X_num_scaled].astype(np.float32, copy=False)

Define train, validation (to find outlier threshold) and test set:

n_train = 25000
n_valid = 5000
X_train, y_train = X[:n_train,:], y[:n_train]
X_valid, y_valid = X[n_train:n_train+n_valid,:], y[n_train:n_train+n_valid]
X_test, y_test = X[n_train+n_valid:,:], y[n_train+n_valid:]
print(X_train.shape, y_train.shape,
      X_valid.shape, y_valid.shape,
      X_test.shape, y_test.shape)

Create outliers

Inject outliers in the numerical features. First we need to know the features for each kind:

cat_cols = list(category_map.keys())
num_cols = [col for col in range(X.shape[1]) if col not in cat_cols]
print(cat_cols, num_cols)

Numerical

Now we can add outliers to the validation (or threshold) and test sets. For the numerical data, we need to specify the numerical columns (cols), the percentage of outliers (perc_outlier), the strength (n_std) and the minimum size of the perturbation (min_std). The outliers are distributed evenly across the numerical features:

perc_outlier = 10
data = inject_outlier_tabular(X_valid, num_cols, perc_outlier, n_std=8., min_std=6.)
X_threshold, y_threshold = data.data, data.target
X_threshold_, y_threshold_ = X_threshold.copy(), y_threshold.copy()  # store for comparison later
outlier_perc = 100 * y_threshold.sum() / len(y_threshold)
print('{:.2f}% outliers'.format(outlier_perc))

Let's inspect an instance that was changed:

outlier_idx = np.where(y_threshold != 0)[0]
vdiff = X_threshold[outlier_idx[0]] - X_valid[outlier_idx[0]]
fdiff = np.where(vdiff != 0)[0]
print('{} changed by {:.2f}.'.format(feature_names[fdiff[0]], vdiff[fdiff[0]]))

Same thing for the test set:

data = inject_outlier_tabular(X_test, num_cols, perc_outlier, n_std=8., min_std=6.)
X_outlier, y_outlier = data.data, data.target
print('{:.2f}% outliers'.format(100 * y_outlier.sum() / len(y_outlier)))

Apply one-hot encoding

OHE to train, threshold and outlier sets:

X_train_ohe = ohe.transform(X_train[:, :-4].copy())
X_threshold_ohe = ohe.transform(X_threshold[:, :-4].copy())
X_outlier_ohe = ohe.transform(X_outlier[:, :-4].copy())
print(X_train_ohe.shape, X_threshold_ohe.shape, X_outlier_ohe.shape)

X_train = np.c_[X_train_ohe.toarray(), X_train[:, -4:]].astype(np.float32, copy=False)
X_threshold = np.c_[X_threshold_ohe.toarray(), X_threshold[:, -4:]].astype(np.float32, copy=False)
X_outlier = np.c_[X_outlier_ohe.toarray(), X_outlier[:, -4:]].astype(np.float32, copy=False)
print(X_train.shape, X_threshold.shape, X_outlier.shape)

Load or define outlier detector

load_outlier_detector = True

filepath = './models/'  # change to directory where model is downloaded
if load_outlier_detector:  # load pretrained outlier detector
    detector_type = 'outlier'
    dataset = 'adult'
    detector_name = 'OutlierVAE'
    od = fetch_detector(filepath, detector_type, dataset, detector_name)
else:  # define model, initialize, train and save outlier detector
    n_features = X_train.shape[1]
    latent_dim = 2

    encoder_net = tf.keras.Sequential(
      [
          InputLayer(input_shape=(n_features,)),
          Dense(25, activation=tf.nn.relu),
          Dense(10, activation=tf.nn.relu),
          Dense(5, activation=tf.nn.relu)
      ])

    decoder_net = tf.keras.Sequential(
      [
          InputLayer(input_shape=(latent_dim,)),
          Dense(5, activation=tf.nn.relu),
          Dense(10, activation=tf.nn.relu),
          Dense(25, activation=tf.nn.relu),
          Dense(n_features, activation=None)
      ])

    # initialize outlier detector
    od = OutlierVAE(threshold=None,  # threshold for outlier score
                    score_type='mse',  # use MSE of reconstruction error for outlier detection
                    encoder_net=encoder_net,  # can also pass VAE model instead
                    decoder_net=decoder_net,  # of separate encoder and decoder
                    latent_dim=latent_dim,
                    samples=5)

    # train
    od.fit(X_train,
           loss_fn=tf.keras.losses.mse,
           epochs=5,
           verbose=True)

    # save the trained outlier detector
    save_detector(od, filepath)

od.infer_threshold(X_threshold, threshold_perc=100-outlier_perc, outlier_perc=100)
print('New threshold: {}'.format(od.threshold))

Let’s save the outlier detector with updated threshold:

save_detector(od, filepath)

Detect outliers

od_preds = od.predict(X_outlier,
                      outlier_type='instance',
                      return_feature_score=True,
                      return_instance_score=True)

Display results

F1 score and confusion matrix:

labels = data.target_names
y_pred = od_preds['data']['is_outlier']
f1 = f1_score(y_outlier, y_pred)
acc = accuracy_score(y_outlier, y_pred)
prec = precision_score(y_outlier, y_pred)
rec = recall_score(y_outlier, y_pred)
print('F1 score: {:.2f} -- Accuracy: {:.2f} -- Precision: {:.2f} -- Recall: {:.2f}'.format(f1, acc, prec, rec))
cm = confusion_matrix(y_outlier, y_pred)
df_cm = pd.DataFrame(cm, index=labels, columns=labels)
sns.heatmap(df_cm, annot=True, cbar=True, linewidths=.5)
plt.show()

Plot instance level outlier scores vs. the outlier threshold:

plot_instance_score(od_preds, y_outlier.astype(int), labels, od.threshold, ylim=(0, 25))

VAE outlier detection on CIFAR10

Method

The Variational Auto-Encoder (VAE) outlier detector is first trained on a batch of unlabeled, but normal (inlier) data. Unsupervised training is desireable since labeled data is often scarce. The VAE detector tries to reconstruct the input it receives. If the input data cannot be reconstructed well, the reconstruction error is high and the data can be flagged as an outlier. The reconstruction error is either measured as the mean squared error (MSE) between the input and the reconstructed instance or as the probability that both the input and the reconstructed instance are generated by the same process.

Dataset

CIFAR10 consists of 60,000 32 by 32 RGB images equally distributed over 10 classes.

import os
import logging
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
tf.keras.backend.clear_session()
from tensorflow.keras.layers import Conv2D, Conv2DTranspose, Dense, Layer, Reshape, InputLayer
from tqdm import tqdm

from alibi_detect.models.tensorflow import elbo
from alibi_detect.od import OutlierVAE
from alibi_detect.utils.fetching import fetch_detector
from alibi_detect.utils.perturbation import apply_mask
from alibi_detect.saving import save_detector, load_detector
from alibi_detect.utils.visualize import plot_instance_score, plot_feature_outlier_image

logger = tf.get_logger()
logger.setLevel(logging.ERROR)

Load CIFAR10 data

train, test = tf.keras.datasets.cifar10.load_data()
X_train, y_train = train
X_test, y_test = test

X_train = X_train.astype('float32') / 255
X_test = X_test.astype('float32') / 255
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

Load or define outlier detector

load_outlier_detector = True

filepath = 'my_path'  # change to directory where model is downloaded
detector_type = 'outlier'
dataset = 'cifar10'
detector_name = 'OutlierVAE'
filepath = os.path.join(filepath, detector_name)
if load_outlier_detector:  # load pretrained outlier detector
    od = fetch_detector(filepath, detector_type, dataset, detector_name)
else:  # define model, initialize, train and save outlier detector
    latent_dim = 1024
    
    encoder_net = tf.keras.Sequential(
      [
          InputLayer(input_shape=(32, 32, 3)),
          Conv2D(64, 4, strides=2, padding='same', activation=tf.nn.relu),
          Conv2D(128, 4, strides=2, padding='same', activation=tf.nn.relu),
          Conv2D(512, 4, strides=2, padding='same', activation=tf.nn.relu)
      ])

    decoder_net = tf.keras.Sequential(
      [
          InputLayer(input_shape=(latent_dim,)),
          Dense(4*4*128),
          Reshape(target_shape=(4, 4, 128)),
          Conv2DTranspose(256, 4, strides=2, padding='same', activation=tf.nn.relu),
          Conv2DTranspose(64, 4, strides=2, padding='same', activation=tf.nn.relu),
          Conv2DTranspose(3, 4, strides=2, padding='same', activation='sigmoid')
      ])
    
    # initialize outlier detector
    od = OutlierVAE(threshold=.015,  # threshold for outlier score
                    score_type='mse',  # use MSE of reconstruction error for outlier detection
                    encoder_net=encoder_net,  # can also pass VAE model instead
                    decoder_net=decoder_net,  # of separate encoder and decoder
                    latent_dim=latent_dim,
                    samples=2)
    # train
    od.fit(X_train, 
           loss_fn=elbo,
           cov_elbo=dict(sim=.05),
           epochs=50,
           verbose=False)
    
    # save the trained outlier detector
    save_detector(od, filepath)

Check quality VAE model

idx = 8
X = X_train[idx].reshape(1, 32, 32, 3)
X_recon = od.vae(X)

plt.imshow(X.reshape(32, 32, 3))
plt.axis('off')
plt.show()

plt.imshow(X_recon.numpy().reshape(32, 32, 3))
plt.axis('off')
plt.show()

Check outliers on original CIFAR images

X = X_train[:500]
print(X.shape)

od_preds = od.predict(X,
                      outlier_type='instance',    # use 'feature' or 'instance' level
                      return_feature_score=True,  # scores used to determine outliers
                      return_instance_score=True)
print(list(od_preds['data'].keys()))

Plot instance level outlier scores

target = np.zeros(X.shape[0],).astype(int)  # all normal CIFAR10 training instances
labels = ['normal', 'outlier']
plot_instance_score(od_preds, target, labels, od.threshold)

Visualize predictions

X_recon = od.vae(X).numpy()
plot_feature_outlier_image(od_preds, 
                           X, 
                           X_recon=X_recon,
                           instance_ids=[8, 60, 100, 330],  # pass a list with indices of instances to display
                           max_instances=5,  # max nb of instances to display
                           outliers_only=False)  # only show outlier predictions

Predict outliers on perturbed CIFAR images

# nb of predictions per image: n_masks * n_mask_sizes 
n_mask_sizes = 10
n_masks = 20
n_imgs = 50

Define masks and get images:

mask_sizes = [(2*n,2*n) for n in range(1,n_mask_sizes+1)]
print(mask_sizes)
img_ids = np.arange(n_imgs)
X_orig = X[img_ids].reshape(img_ids.shape[0], 32, 32, 3)
print(X_orig.shape)

Calculate instance level outlier scores:

#| scrolled: true
all_img_scores = []
for i in tqdm(range(X_orig.shape[0])):
    img_scores = np.zeros((len(mask_sizes),))
    for j, mask_size in enumerate(mask_sizes):
        # create masked instances
        X_mask, mask = apply_mask(X_orig[i].reshape(1, 32, 32, 3),
                                  mask_size=mask_size,
                                  n_masks=n_masks,
                                  channels=[0,1,2],
                                  mask_type='normal',
                                  noise_distr=(0,1),
                                  clip_rng=(0,1))
        # predict outliers
        od_preds_mask = od.predict(X_mask)
        score = od_preds_mask['data']['instance_score']
        # store average score over `n_masks` for a given mask size
        img_scores[j] = np.mean(score)
    all_img_scores.append(img_scores)

Visualize outlier scores vs. mask sizes

x_plt = [mask[0] for mask in mask_sizes]

for ais in all_img_scores:
    plt.plot(x_plt, ais)
    plt.xticks(x_plt)
plt.title('Outlier Score All Images for Increasing Mask Size')
plt.xlabel('Mask size')
plt.ylabel('Outlier Score')
plt.show()

ais_np = np.zeros((len(all_img_scores), all_img_scores[0].shape[0]))
for i, ais in enumerate(all_img_scores):
    ais_np[i, :] = ais
ais_mean = np.mean(ais_np, axis=0)
plt.title('Mean Outlier Score All Images for Increasing Mask Size')
plt.xlabel('Mask size')
plt.ylabel('Outlier score')
plt.plot(x_plt, ais_mean)
plt.xticks(x_plt)
plt.show()

Investigate instance level outlier

i = 8  # index of instance to look at

plt.plot(x_plt, all_img_scores[i])
plt.xticks(x_plt)
plt.title('Outlier Scores Image {} for Increasing Mask Size'.format(i))
plt.xlabel('Mask size')
plt.ylabel('Outlier score')
plt.show()

Reconstruction of masked images and outlier scores per channel:

all_X_mask = []
X_i = X_orig[i].reshape(1, 32, 32, 3)
all_X_mask.append(X_i)
# apply masks
for j, mask_size in enumerate(mask_sizes):
    # create masked instances
    X_mask, mask = apply_mask(X_i,
                              mask_size=mask_size,
                              n_masks=1,  # just 1 for visualization purposes
                              channels=[0,1,2],
                              mask_type='normal',
                              noise_distr=(0,1),
                              clip_rng=(0,1))
    all_X_mask.append(X_mask)
all_X_mask = np.concatenate(all_X_mask, axis=0)
all_X_recon = od.vae(all_X_mask).numpy()
od_preds = od.predict(all_X_mask)

Visualize:

plot_feature_outlier_image(od_preds, 
                           all_X_mask, 
                           X_recon=all_X_recon, 
                           max_instances=all_X_mask.shape[0], 
                           n_channels=3)

Predict outliers on a subset of features

perc_list = [20, 40, 60, 80, 100]

all_perc_scores = []
for perc in perc_list:
    od_preds_perc = od.predict(all_X_mask, outlier_perc=perc)
    iscore = od_preds_perc['data']['instance_score']
    all_perc_scores.append(iscore)

Visualize outlier scores vs. mask sizes and percentage of features used:

x_plt = [0] + x_plt
for aps in all_perc_scores:
    plt.plot(x_plt, aps)
    plt.xticks(x_plt)
plt.legend(perc_list)
plt.title('Outlier Score for Increasing Mask Size and Different Feature Subsets')
plt.xlabel('Mask Size')
plt.ylabel('Outlier Score')
plt.show()

Infer outlier threshold value

print('Current threshold: {}'.format(od.threshold))
od.infer_threshold(X, threshold_perc=99)  # assume 1% of the training data are outliers
print('New threshold: {}'.format(od.threshold))

VAE outlier detection on KDD Cup ‘99 dataset

Method

The Variational Auto-Encoder () outlier detector is first trained on a batch of unlabeled, but normal (inlier) data. Unsupervised training is desireable since labeled data is often scarce. The VAE detector tries to reconstruct the input it receives. If the input data cannot be reconstructed well, the reconstruction error is high and the data can be flagged as an outlier. The reconstruction error is either measured as the mean squared error (MSE) between the input and the reconstructed instance or as the probability that both the input and the reconstructed instance are generated by the same process.

Dataset

There are 4 types of attacks in the dataset:

DOS: denial-of-service, e.g. syn flood;
R2L: unauthorized access from a remote machine, e.g. guessing password;
U2R: unauthorized access to local superuser (root) privileges;
probing: surveillance and other probing, e.g., port scanning.

The dataset contains about 5 million connection records.

There are 3 types of features:

basic features of individual connections, e.g. duration of connection
content features within a connection, e.g. number of failed log in attempts
traffic features within a 2 second window, e.g. number of connections to the same host as the current connection

This notebook requires the seaborn package for visualization which can be installed via pip:

Load dataset

We only keep a number of continuous (18 out of 41) features.

Assume that a model is trained on normal instances of the dataset (not outliers) and standardization is applied:

Apply standardization:

Load or define outlier detector

We could have also inferred the threshold from the normal training data by setting threshold_perc e.g. at 99 and adding a bit of margin on top of the inferred threshold. Let's save the outlier detector with updated threshold:

Detect outliers

We now generate a batch of data with 10% outliers and detect the outliers in the batch.

Predict outliers:

Display results

F1 score and confusion matrix:

Plot instance level outlier scores vs. the outlier threshold:

We can clearly see that some outliers are very easy to detect while others have outlier scores closer to the normal data. We can also plot the ROC curve for the outlier scores of the detector:

Investigate instance level outlier

We can now take a closer look at some of the individual predictions on X_outlier.

The srv_count feature is responsible for a lot of the displayed outliers.