1 of 11

Methods

Mahalanobis Distance

Overview

The Mahalanobis online outlier detector aims to predict anomalies in tabular data. The algorithm calculates an outlier score, which is a measure of distance from the center of the features distribution (Mahalanobis distance). If this outlier score is higher than a user-defined threshold, the observation is flagged as an outlier. The algorithm is online, which means that it starts without knowledge about the distribution of the features and learns as requests arrive. Consequently you should expect the output to be bad at the start and to improve over time. The algorithm is suitable for low to medium dimensional tabular data.

The algorithm is also able to include categorical variables. The fit step first computes pairwise distances between the categories of each categorical variable. The pairwise distances are based on either the model predictions (MVDM method) or the context provided by the other variables in the dataset (ABDM method). For MVDM, we use the difference between the conditional model prediction probabilities of each category. This method is based on the Modified Value Difference Metric (MVDM) by Cost et al (1993). ABDM stands for Association-Based Distance Metric, a categorical distance measure introduced by Le et al (2005). ABDM infers context from the presence of other variables in the data and computes a dissimilarity measure based on the Kullback-Leibler divergence. Both methods can also be combined as ABDM-MVDM. We can then apply multidimensional scaling to project the pairwise distances into Euclidean space.

Usage

Initialize

Parameters:

threshold: Mahalanobis distance threshold above which the instance is flagged as an outlier.
n_components: number of principal components used.
std_clip: feature-wise standard deviation used to clip the observations before updating the mean and covariance matrix.
start_clip: number of observations before clipping is applied.
max_n: algorithm behaves as if it has seen at most max_n points.
cat_vars: dictionary with as keys the categorical columns and as values the number of categories per categorical variable. Only needed if categorical variables are present.
ohe: boolean whether the categorical variables are one-hot encoded (OHE) or not. If not OHE, they are assumed to have ordinal encodings.
data_type: can specify data type added to metadata. E.g. 'tabular' or 'image'.

Initialized outlier detector example:

from alibi_detect.od import Mahalanobis

od = Mahalanobis(
    threshold=10.,
    n_components=2,
    std_clip=3,
    start_clip=100
)

Fit

We only need to fit the outlier detector if there are categorical variables present in the data. The following parameters can be specified:

X: training batch as a numpy array.
y: model class predictions or ground truth labels for X. Used for 'mvdm' and 'abdm-mvdm' pairwise distance metrics. Not needed for 'abdm'.
d_type: pairwise distance metric used for categorical variables. Currently, 'abdm', 'mvdm' and 'abdm-mvdm' are supported. 'abdm' infers context from the other variables while 'mvdm' uses the model predictions. 'abdm-mvdm' is a weighted combination of the two metrics.
w: weight on 'abdm' (between 0. and 1.) distance if d_type equals 'abdm-mvdm'.
disc_perc: list with percentiles used in binning of numerical features used for the 'abdm' and 'abdm-mvdm' pairwise distance measures.
standardize_cat_vars: standardize numerical values of categorical variables if True.
feature_range: tuple with min and max ranges to allow for numerical values of categorical variables. Min and max ranges can be floats or numpy arrays with dimension (1, number of features) for feature-wise ranges.
smooth: smoothing exponent between 0 and 1 for the distances. Lower values will smooth the difference in distance metric between different features.
center: whether to center the scaled distance measures. If False, the min distance for each feature except for the feature with the highest raw max distance will be the lower bound of the feature range, but the upper bound will be below the max feature range.

od.fit(
    X_train,
    d_type='abdm',
    disc_perc=[25, 50, 75]
)

It is often hard to find a good threshold value. If we have a batch of normal and outlier data and we know approximately the percentage of normal data in the batch, we can infer a suitable threshold:

od.infer_threshold(
    X, 
    threshold_perc=95
)

Beware though that the outlier detector is stateful and every call to the score function will update the mean and covariance matrix, even when inferring the threshold.

Detect

We detect outliers by simply calling predict on a batch of instances X to compute the instance level Mahalanobis distances. We can also return the instance level outlier score by setting return_instance_score to True.

The prediction takes the form of a dictionary with meta and data keys. meta contains the detector's metadata while data is also a dictionary which contains the actual predictions stored in the following keys:

is_outlier: boolean whether instances are above the threshold and therefore outlier instances. The array is of shape (batch size,).
instance_score: contains instance level scores if return_instance_score equals True.

preds = od.predict(
    X,
    return_instance_score=True
)

Examples

Tabular

Outlier detection on KDD Cup 99

Isolation Forest

source

Isolation Forest

Overview

Isolation forests (IF) are tree based models specifically used for outlier detection. The IF isolates observations by randomly selecting a feature and then randomly selecting a split value between the maximum and minimum values of the selected feature. The number of splittings required to isolate a sample is equivalent to the path length from the root node to the terminating node. This path length, averaged over a forest of random trees, is a measure of normality and is used to define an anomaly score. Outliers can typically be isolated quicker, leading to shorter paths. The algorithm is suitable for low to medium dimensional tabular data.

Usage

Initialize

Parameters:

threshold: threshold value for the outlier score above which the instance is flagged as an outlier.
n_estimators: number of base estimators in the ensemble. Defaults to 100.
max_samples: number of samples to draw from the training data to train each base estimator. If int, draw max_samples samples. If float, draw max_samples times number of features samples. If 'auto', max_samples = min(256, number of samples).
max_features: number of features to draw from the training data to train each base estimator. If int, draw max_features features. If float, draw max_features times number of features features.
bootstrap: whether to fit individual trees on random subsets of the training data, sampled with replacement.
n_jobs: number of jobs to run in parallel for fit and predict.
data_type: can specify data type added to metadata. E.g. 'tabular' or 'image'.

Initialized outlier detector example:

from alibi_detect.od import IForest

od = IForest(
    threshold=0.,
    n_estimators=100
)

Fit

We then need to train the outlier detector. The following parameters can be specified:

X: training batch as a numpy array.
sample_weight: array with shape (batch size,) used to assign different weights to each instance during training. Defaults to None.

od.fit(
    X_train
)

It is often hard to find a good threshold value. If we have a batch of normal and outlier data and we know approximately the percentage of normal data in the batch, we can infer a suitable threshold:

od.infer_threshold(
    X, 
    threshold_perc=95
)

Detect

We detect outliers by simply calling predict on a batch of instances X to compute the instance level outlier scores. We can also return the instance level outlier score by setting return_instance_score to True.

is_outlier: boolean whether instances are above the threshold and therefore outlier instances. The array is of shape (batch size,).
instance_score: contains instance level scores if return_instance_score equals True.

preds = od.predict(
    X,
    return_instance_score=True
)

Examples

Tabular

Outlier detection on KDD Cup 99

Variational Auto-Encoder

source

Variational Auto-Encoder

Overview

The Variational Auto-Encoder (VAE) outlier detector is first trained on a batch of unlabeled, but normal (inlier) data. Unsupervised or semi-supervised training is desirable since labeled data is often scarce. The VAE detector tries to reconstruct the input it receives. If the input data cannot be reconstructed well, the reconstruction error is high and the data can be flagged as an outlier. The reconstruction error is either measured as the mean squared error (MSE) between the input and the reconstructed instance or as the probability that both the input and the reconstructed instance are generated by the same process. The algorithm is suitable for tabular and image data.

Usage

Initialize

Parameters:

threshold: threshold value above which the instance is flagged as an outlier.
score_type: scoring method used to detect outliers. Currently only the default 'mse' supported.
latent_dim: latent dimension of the VAE.
encoder_net: tf.keras.Sequential instance containing the encoder network. Example:

encoder_net = tf.keras.Sequential(
  [
      InputLayer(input_shape=(32, 32, 3)),
      Conv2D(64, 4, strides=2, padding='same', activation=tf.nn.relu),
      Conv2D(128, 4, strides=2, padding='same', activation=tf.nn.relu),
      Conv2D(512, 4, strides=2, padding='same', activation=tf.nn.relu)
  ])

decoder_net: tf.keras.Sequential instance containing the decoder network. Example:

decoder_net = tf.keras.Sequential(
  [
      InputLayer(input_shape=(latent_dim,)),
      Dense(4*4*128),
      Reshape(target_shape=(4, 4, 128)),
      Conv2DTranspose(256, 4, strides=2, padding='same', activation=tf.nn.relu),
      Conv2DTranspose(64, 4, strides=2, padding='same', activation=tf.nn.relu),
      Conv2DTranspose(3, 4, strides=2, padding='same', activation='sigmoid')
  ])

vae: instead of using a separate encoder and decoder, the VAE can also be passed as a tf.keras.Model.
samples: number of samples drawn during detection for each instance to detect.
beta: weight on the KL-divergence loss term following the $\beta$-VAE framework. Default equals 1.
data_type: can specify data type added to metadata. E.g. 'tabular' or 'image'.

Initialized outlier detector example:

from alibi_detect.od import OutlierVAE

od = OutlierVAE(
    threshold=0.1,
    encoder_net=encoder_net,
    decoder_net=decoder_net,
    latent_dim=1024,
    samples=10
)

Fit

We then need to train the outlier detector. The following parameters can be specified:

X: training batch as a numpy array of preferably normal data.
loss_fn: loss function used for training. Defaults to the elbo loss.
optimizer: optimizer used for training. Defaults to Adam with learning rate 1e-3.
cov_elbo: dictionary with covariance matrix options in case the elbo loss function is used. Either use the full covariance matrix inferred from X (dict(cov_full=None)), only the variance (dict(cov_diag=None)) or a float representing the same standard deviation for each feature (e.g. dict(sim=.05)) which is the default.
epochs: number of training epochs.
batch_size: batch size used during training.
verbose: boolean whether to print training progress.
log_metric: additional metrics whose progress will be displayed if verbose equals True.

od.fit(
    X_train,
    epochs=50
)

It is often hard to find a good threshold value. If we have a batch of normal and outlier data and we know approximately the percentage of normal data in the batch, we can infer a suitable threshold:

od.infer_threshold(
    X, 
    threshold_perc=95
)

Detect

We detect outliers by simply calling predict on a batch of instances X. Detection can be customized via the following parameters:

outlier_type: either 'instance' or 'feature'. If the outlier type equals 'instance', the outlier score at the instance level will be used to classify the instance as an outlier or not. If 'feature' is selected, outlier detection happens at the feature level (e.g. by pixel in images).
outlier_perc: percentage of the sorted (descending) feature level outlier scores. We might for instance want to flag an image as an outlier if at least 20% of the pixel values are on average above the threshold. In this case, we set outlier_perc to 20. The default value is 100 (using all the features).
return_feature_score: boolean whether to return the feature level outlier scores.
return_instance_score: boolean whether to return the instance level outlier scores.

is_outlier: boolean whether instances or features are above the threshold and therefore outliers. If outlier_type equals 'instance', then the array is of shape (batch size,). If it equals 'feature', then the array is of shape (batch size, instance shape).
feature_score: contains feature level scores if return_feature_score equals True.
instance_score: contains instance level scores if return_instance_score equals True.

preds = od.predict(
    X,
    outlier_type='instance',
    outlier_perc=75,
    return_feature_score=True,
    return_instance_score=True
)

Examples

Image

Outlier detection on CIFAR10

Tabular

Outlier detection on KDD Cup 99

Outlier detection on Adult dataset

Auto-Encoder

Overview

The Auto-Encoder (AE) outlier detector is first trained on a batch of unlabeled, but normal (inlier) data. Unsupervised training is desireable since labeled data is often scarce. The AE detector tries to reconstruct the input it receives. If the input data cannot be reconstructed well, the reconstruction error is high and the data can be flagged as an outlier. The reconstruction error is measured as the mean squared error (MSE) between the input and the reconstructed instance.

Usage

Initialize

Parameters:

threshold: threshold value above which the instance is flagged as an outlier.
encoder_net: tf.keras.Sequential instance containing the encoder network. Example:

encoder_net = tf.keras.Sequential(
  [
      InputLayer(input_shape=(32, 32, 3)),
      Conv2D(64, 4, strides=2, padding='same', activation=tf.nn.relu),
      Conv2D(128, 4, strides=2, padding='same', activation=tf.nn.relu),
      Conv2D(512, 4, strides=2, padding='same', activation=tf.nn.relu)
  ])

decoder_net: tf.keras.Sequential instance containing the decoder network. Example:

decoder_net = tf.keras.Sequential(
  [
      InputLayer(input_shape=(1024,)),
      Dense(4*4*128),
      Reshape(target_shape=(4, 4, 128)),
      Conv2DTranspose(256, 4, strides=2, padding='same', activation=tf.nn.relu),
      Conv2DTranspose(64, 4, strides=2, padding='same', activation=tf.nn.relu),
      Conv2DTranspose(3, 4, strides=2, padding='same', activation='sigmoid')
  ])

ae: instead of using a separate encoder and decoder, the AE can also be passed as a tf.keras.Model.
data_type: can specify data type added to metadata. E.g. 'tabular' or 'image'.

Initialized outlier detector example:

from alibi_detect.od import OutlierAE

od = OutlierAE(threshold=0.1,
               encoder_net=encoder_net,
               decoder_net=decoder_net)

Fit

We then need to train the outlier detector. The following parameters can be specified:

X: training batch as a numpy array of preferably normal data.
loss_fn: loss function used for training. Defaults to the Mean Squared Error loss.
epochs: number of training epochs.
batch_size: batch size used during training.
verbose: boolean whether to print training progress.
log_metric: additional metrics whose progress will be displayed if verbose equals True.

od.fit(X_train, epochs=50)

It is often hard to find a good threshold value. If we have a batch of normal and outlier data and we know approximately the percentage of normal data in the batch, we can infer a suitable threshold:

od.infer_threshold(X, threshold_perc=95)

Detect

We detect outliers by simply calling predict on a batch of instances X. Detection can be customized via the following parameters:

outlier_type: either 'instance' or 'feature'. If the outlier type equals 'instance', the outlier score at the instance level will be used to classify the instance as an outlier or not. If 'feature' is selected, outlier detection happens at the feature level (e.g. by pixel in images).
outlier_perc: percentage of the sorted (descending) feature level outlier scores. We might for instance want to flag an image as an outlier if at least 20% of the pixel values are on average above the threshold. In this case, we set outlier_perc to 20. The default value is 100 (using all the features).
return_feature_score: boolean whether to return the feature level outlier scores.
return_instance_score: boolean whether to return the instance level outlier scores.

is_outlier: boolean whether instances or features are above the threshold and therefore outliers. If outlier_type equals 'instance', then the array is of shape (batch size,). If it equals 'feature', then the array is of shape (batch size, instance shape).
feature_score: contains feature level scores if return_feature_score equals True.
instance_score: contains instance level scores if return_instance_score equals True.

preds = od.predict(X,
                   outlier_type='instance',
                   outlier_perc=75,
                   return_feature_score=True,
                   return_instance_score=True)

Examples

Image

Variational Auto-Encoding Gaussian Mixture Model

Overview

The Variational Auto-Encoding Gaussian Mixture Model (VAEGMM) Outlier Detector follows the paper but with a instead of a regular Auto-Encoder. The encoder compresses the data while the reconstructed instances generated by the decoder are used to create additional features based on the reconstruction error between the input and the reconstructions. These features are combined with encodings and fed into a Gaussian Mixture Model (). The VAEGMM outlier detector is first trained on a batch of unlabeled, but normal (inlier) data. Unsupervised or semi-supervised training is desirable since labeled data is often scarce. The sample energy of the GMM can then be used to determine whether an instance is an outlier (high sample energy) or not (low sample energy). The algorithm is suitable for tabular and image data.

Usage

Initialize

Parameters:

threshold: threshold value for the sample energy above which the instance is flagged as an outlier.
latent_dim: latent dimension of the VAE.
n_gmm: number of components in the GMM.
encoder_net: tf.keras.Sequential instance containing the encoder network. Example:

encoder_net = tf.keras.Sequential(
[
    InputLayer(input_shape=(n_features,)),
    Dense(60, activation=tf.nn.tanh),
    Dense(30, activation=tf.nn.tanh),
    Dense(10, activation=tf.nn.tanh),
    Dense(latent_dim, activation=None)
])

decoder_net: tf.keras.Sequential instance containing the decoder network. Example:

decoder_net = tf.keras.Sequential(
[
    InputLayer(input_shape=(latent_dim,)),
    Dense(10, activation=tf.nn.tanh),
    Dense(30, activation=tf.nn.tanh),
    Dense(60, activation=tf.nn.tanh),
    Dense(n_features, activation=None)
])

gmm_density_net: layers for the GMM network wrapped in a tf.keras.Sequential class. Example:

gmm_density_net = tf.keras.Sequential(
[
    InputLayer(input_shape=(latent_dim + 2,)),
    Dense(10, activation=tf.nn.tanh),
    Dense(n_gmm, activation=tf.nn.softmax)
])

vaegmm: instead of using a separate encoder, decoder and GMM density net, the VAEGMM can also be passed as a tf.keras.Model.
samples: number of samples drawn during detection for each instance to detect.
recon_features: function to extract features from the reconstructed instance by the decoder. Defaults to a combination of the mean squared reconstruction error and the cosine similarity between the original and reconstructed instances by the VAE.
data_type: can specify data type added to metadata. E.g. 'tabular' or 'image'.

Initialized outlier detector example:

from alibi_detect.od import OutlierVAEGMM

od = OutlierVAEGMM(
    threshold=7.5,
    encoder_net=encoder_net,
    decoder_net=decoder_net,
    gmm_density_net=gmm_density_net,
    latent_dim=4,
    n_gmm=2,
    samples=10
)

Fit

We then need to train the outlier detector. The following parameters can be specified:

X: training batch as a numpy array of preferably normal data.
w_recon: weight on elbo loss term. Defaults to 1e-7.
w_energy: weight on sample energy loss term. Defaults to 0.1.
w_cov_diag: weight on covariance diagonals. Defaults to 0.005.
cov_elbo: dictionary with covariance matrix options in case the elbo loss function is used. Either use the full covariance matrix inferred from X (dict(cov_full=None)), only the variance (dict(cov_diag=None)) or a float representing the same standard deviation for each feature (e.g. dict(sim=.05)) which is the default.
epochs: number of training epochs.
batch_size: batch size used during training.
verbose: boolean whether to print training progress.
log_metric: additional metrics whose progress will be displayed if verbose equals True.

od.fit(
    X_train,
    epochs=10,
    batch_size=1024
)

It is often hard to find a good threshold value. If we have a batch of normal and outlier data and we know approximately the percentage of normal data in the batch, we can infer a suitable threshold:

od.infer_threshold(
    X, 
    threshold_perc=95
)

Detect

We detect outliers by simply calling predict on a batch of instances X to compute the instance level sample energies. We can also return the instance level outlier score by setting return_instance_score to True.

is_outlier: boolean whether instances are above the threshold and therefore outlier instances. The array is of shape (batch size,).
instance_score: contains instance level scores if return_instance_score equals True.

preds = od.predict(
    X,
    return_instance_score=True
)

Examples

Tabular

Auto-Encoding Gaussian Mixture Model

Overview

The Auto-Encoding Gaussian Mixture Model (AEGMM) Outlier Detector follows the paper. The encoder compresses the data while the reconstructed instances generated by the decoder are used to create additional features based on the reconstruction error between the input and the reconstructions. These features are combined with encodings and fed into a Gaussian Mixture Model (). The AEGMM outlier detector is first trained on a batch of unlabeled, but normal (inlier) data. Unsupervised or semi-supervised training is desirable since labeled data is often scarce. The sample energy of the GMM can then be used to determine whether an instance is an outlier (high sample energy) or not (low sample energy). The algorithm is suitable for tabular and image data.

Usage

Initialize

Parameters:

threshold: threshold value for the sample energy above which the instance is flagged as an outlier.
n_gmm: number of components in the GMM.
encoder_net: tf.keras.Sequential instance containing the encoder network. Example:

encoder_net = tf.keras.Sequential(
[
    InputLayer(input_shape=(n_features,)),
    Dense(60, activation=tf.nn.tanh),
    Dense(30, activation=tf.nn.tanh),
    Dense(10, activation=tf.nn.tanh),
    Dense(latent_dim, activation=None)
])

decoder_net: tf.keras.Sequential instance containing the decoder network. Example:

decoder_net = tf.keras.Sequential(
[
    InputLayer(input_shape=(latent_dim,)),
    Dense(10, activation=tf.nn.tanh),
    Dense(30, activation=tf.nn.tanh),
    Dense(60, activation=tf.nn.tanh),
    Dense(n_features, activation=None)
])

gmm_density_net: layers for the GMM network wrapped in a tf.keras.Sequential class. Example:

gmm_density_net = tf.keras.Sequential(
[
    InputLayer(input_shape=(latent_dim + 2,)),
    Dense(10, activation=tf.nn.tanh),
    Dense(n_gmm, activation=tf.nn.softmax)
])

aegmm: instead of using a separate encoder, decoder and GMM density net, the AEGMM can also be passed as a tf.keras.Model.
recon_features: function to extract features from the reconstructed instance by the decoder. Defaults to a combination of the mean squared reconstruction error and the cosine similarity between the original and reconstructed instances by the AE.
data_type: can specify data type added to metadata. E.g. 'tabular' or 'image'.

Initialized outlier detector example:

from alibi_detect.od import OutlierAEGMM

od = OutlierAEGMM(
    threshold=7.5,
    encoder_net=encoder_net,
    decoder_net=decoder_net,
    gmm_density_net=gmm_density_net,
    n_gmm=2
)

Fit

We then need to train the outlier detector. The following parameters can be specified:

X: training batch as a numpy array of preferably normal data.
loss_fn: loss function used for training. Defaults to the custom AEGMM loss which is a combination of the mean squared reconstruction error, the sample energy of the GMM and a loss term penalizing small values on the diagonals of the covariance matrices in the GMM to avoid trivial solutions. It is important to balance the loss weights below so no single loss term dominates during the optimization.
w_energy: weight on sample energy loss term. Defaults to 0.1.
w_cov_diag: weight on covariance diagonals. Defaults to 0.005.
epochs: number of training epochs.
batch_size: batch size used during training.
verbose: boolean whether to print training progress.
log_metric: additional metrics whose progress will be displayed if verbose equals True.

od.fit(
    X_train,
    epochs=10,
    batch_size=1024
)

It is often hard to find a good threshold value. If we have a batch of normal and outlier data and we know approximately the percentage of normal data in the batch, we can infer a suitable threshold:

od.infer_threshold(
    X, 
    threshold_perc=95
)

Detect

is_outlier: boolean whether instances are above the threshold and therefore outlier instances. The array is of shape (batch size,).
instance_score: contains instance level scores if return_instance_score equals True.

preds = od.predict(
    X,
    return_instance_score=True
)

Examples

Tabular

Likelihood Ratios for Outlier Detection

Overview

The outlier detector described by in uses the likelihood ratio (LLR) between 2 generative models as the outlier score. One model is trained on the original data while the other is trained on a perturbed version of the dataset. This is based on the observation that the log likelihood for an instance under a generative model can be heavily affected by population level background statistics. The second generative model is therefore trained to capture the background statistics still present in the perturbed data while the semantic features have been erased by the perturbations.

The perturbations are added using an independent and identical Bernoulli distribution with rate $\mu$ which substitutes a feature with one of the other possible feature values with equal probability. For images, this means for instance changing a pixel with a different pixel value randomly sampled within the $0$ to $255$ pixel range. The package also contains a implementation adapted from the official TensorFlow Probability , and available as a standalone model in alibi_detect.models.tensorflow.pixelcnn.

Usage

Initialize

Parameters:

threshold: outlier threshold value used for the negative likelihood ratio. Scores above the threshold are flagged as outliers.
model: a generative model, either as a tf.keras.Model, TensorFlow Probability distribution or built-in PixelCNN++ model.
model_background: optional separate model fit on the perturbed background data. If this is not specified, a copy of model will be used.
log_prob: if the model does not have a log_prob function like e.g. a TensorFlow Probability distribution, a function needs to be passed that evaluates the log likelihood.
sequential: flag whether the data is sequential or not. Used to create targets during training. Defaults to False.
data_type: can specify data type added to metadata. E.g. 'tabular' or 'image'.

Initialized outlier detector example:

from alibi_detect.od import LLR
from alibi_detect.models.tensorflow import PixelCNN

image_shape = (28, 28, 1)
model = PixelCNN(image_shape)
od = LLR(threshold=-100, model=model)

Fit

We then need to train the 2 generative models in sequence. The following parameters can be specified:

X: training batch as a numpy array of preferably normal data.
mutate_fn: function used to create the perturbations. Defaults to an independent and identical Bernoulli distribution with rate $\mu$
mutate_fn_kwargs: kwargs for mutate_fn. For the default function, the mutation rate and feature range needs to be specified, e.g. dict(rate=.2, feature_range=(0,255)).
loss_fn: loss function used for the generative models.
loss_fn_kwargs: kwargs for the loss function.
epochs: number of training epochs.
batch_size: batch size used during training.
log_metric: additional metrics whose progress will be displayed if verbose equals True.

od.fit(X_train, epochs=10, batch_size=32)

It is often hard to find a good threshold value. If we have a batch of normal and outlier data and we know approximately the percentage of normal data in the batch, we can infer a suitable threshold:

od.infer_threshold(X, threshold_perc=95, batch_size=32)

Detect

We detect outliers by simply calling predict on a batch of instances X. Detection can be customized via the following parameters:

outlier_type: either 'instance' or 'feature'. If the outlier type equals 'instance', the outlier score at the instance level will be used to classify the instance as an outlier or not. If 'feature' is selected, outlier detection happens at the feature level (e.g. by pixel in images).
batch_size: batch size used for model prediction calls.
return_feature_score: boolean whether to return the feature level outlier scores.
return_instance_score: boolean whether to return the instance level outlier scores.

is_outlier: boolean whether instances or features are above the threshold and therefore outliers. If outlier_type equals 'instance', then the array is of shape (batch size,). If it equals 'feature', then the array is of shape (batch size, instance shape).
feature_score: contains feature level scores if return_feature_score equals True.
instance_score: contains instance level scores if return_instance_score equals True.

preds = od.predict(X, outlier_type='instance', batch_size=32)

Examples

Image

Sequential Data

Prophet Detector

source

Prophet Detector

Overview

The Prophet outlier detector uses the Prophet time series forecasting package explained in this excellent paper. The underlying Prophet model is a decomposable univariate time series model combining trend, seasonality and holiday effects. The model forecast also includes an uncertainty interval around the estimated trend component using the MAP estimate of the extrapolated model. Alternatively, full Bayesian inference can be done at the expense of increased compute. The upper and lower values of the uncertainty interval can then be used as outlier thresholds for each point in time. First, the distance from the observed value to the nearest uncertainty boundary (upper or lower) is computed. If the observation is within the boundaries, the outlier score equals the negative distance. As a result, the outlier score is the lowest when the observation equals the model prediction. If the observation is outside of the boundaries, the score equals the distance measure and the observation is flagged as an outlier. One of the main drawbacks of the method however is that you need to refit the model as new data comes in. This is undesirable for applications with high throughput and real-time detection.

Note

To use this detector, first install Prophet by running:

pip install alibi-detect[prophet]

This will install Prophet, and its major dependency PyStan. PyStan is currently only partly supported on Windows. If this detector is to be used on a Windows system, it is recommended to manually install (and test) PyStan before running the command above.

Usage

Initialize

Parameters:

threshold: width of the uncertainty intervals of the forecast, used as outlier threshold. Equivalent to interval_width. If the instance lies outside of the uncertainty intervals, it is flagged as an outlier. If mcmc_samples equals 0, it is the uncertainty in the trend using the MAP estimate of the extrapolated model. If mcmc_samples >0, then uncertainty over all parameters is used.
growth: 'linear' or 'logistic' to specify a linear or logistic trend.
cap: growth cap in case growth equals 'logistic'.
holidays: pandas DataFrame with columns 'holiday' (string) and 'ds' (dates) and optionally columns 'lower_window' and 'upper_window' which specify a range of days around the date to be included as holidays.
holidays_prior_scale: parameter controlling the strength of the holiday components model. Higher values imply a more flexible trend, more prone to more overfitting.
country_holidays: include country-specific holidays via country abbreviations. The holidays for each country are provided by the holidays package in Python. A list of available countries and the country name to use is available on: https://github.com/dr-prodigy/python-holidays. Additionally, Prophet includes holidays for: Brazil (BR), Indonesia (ID), India (IN), Malaysia (MY), Vietnam (VN), Thailand (TH), Philippines (PH), Turkey (TU), Pakistan (PK), Bangladesh (BD), Egypt (EG), China (CN) and Russian (RU).
changepoint_prior_scale: parameter controlling the flexibility of the automatic changepoint selection. Large values will allow many changepoints, potentially leading to overfitting.
changepoint_range: proportion of history in which trend changepoints will be estimated. Higher values means more changepoints, potentially leading to overfitting.
seasonality_mode: either 'additive' or 'multiplicative'.
daily_seasonality: can be 'auto', True, False, or a number of Fourier terms to generate.
weekly_seasonality: can be 'auto', True, False, or a number of Fourier terms to generate.
yearly_seasonality: can be 'auto', True, False, or a number of Fourier terms to generate.
add_seasonality: manually add one or more seasonality components. Pass a list of dicts containing the keys 'name', 'period', 'fourier_order' (obligatory), 'prior_scale' and 'mode' (optional).
seasonality_prior_scale: parameter controlling the strength of the seasonality model. Larger values allow the model to fit larger seasonal fluctuations, potentially leading to overfitting.
uncertainty_samples: number of simulated draws used to estimate uncertainty intervals.
mcmc_samples: If > 0, will do full Bayesian inference with the specified number of MCMC samples. If 0, will do MAP estimation.

Initialized outlier detector example:

from alibi_detect.od import OutlierProphet

od = OutlierProphet(
    threshold=0.9,
    growth='linear'
)

Fit

We then need to train the outlier detector. The fit method takes a pandas DataFrame df with as columns 'ds' containing the dates or timestamps and 'y' for the time series being investigated. The date format is ideally YYYY-MM-DD and timestamp format YYYY-MM-DD HH:MM:SS.

od.fit(df)

Detect

We detect outliers by simply calling predict on a DataFrame df, again with columns 'ds' and 'y' to compute the instance level outlier scores. We can also return the instance level outlier score or the raw Prophet model forecast by setting respectively return_instance_score or return_forecast to True. It is important that the dates or timestamps of the test data follow the training data.

is_outlier: DataFrame with columns 'ds' containing the dates or timestamps and 'is_outlier' a boolean whether instances are above the threshold and therefore outlier instances.
instance_score: DataFrame with 'ds' and 'instance_score' which contains instance level scores if return_instance_score equals True.
forecast: DataFrame with the raw model predictions if return_forecast equals True. The DataFrame contains columns with the upper and lower boundaries ('yhat_upper' and 'yhat_lower'), the model predictions ('yhat'), and the decomposition of the prediction in the different components (trend, seasonality, holiday).

preds = od.predict(
    df,
    return_instance_score=True,
    return_forecast=True
)

Examples

Time-series outlier detection using Prophet on weather data

Spectral Residual

source

Spectral Residual

Overview

The Spectral Residual outlier detector is based on the paper Time-Series Anomaly Detection Service at Microsoft and is suitable for unsupervised online anomaly detection in univariate time series data. The algorithm first computes the Fourier Transform of the original data. Then it computes the spectral residual of the log amplitude of the transformed signal before applying the Inverse Fourier Transform to map the sequence back from the frequency to the time domain. This sequence is called the saliency map. The anomaly score is then computed as the relative difference between the saliency map values and their moving averages. If the score is above a threshold, the value at a specific timestep is flagged as an outlier. For more details, please check out the paper.

Usage

Initialize

Parameters:

threshold: Threshold used to classify outliers. Relative saliency map distance from the moving average.
window_amp: Window used for the moving average in the spectral residual computation. The spectral residual is the difference between the log amplitude of the Fourier Transform and a convolution of the log amplitude over window_amp.
window_local: Window used for the moving average in the outlier score computation. The outlier score computes the relative difference between the saliency map and a moving average of the saliency map over window_local timesteps.
padding_amp_method: Padding method to be used prior to each convolution over log amplitude. Possible values: constant | replicate | reflect. Default value: replicate.
- constant - padding with constant 0.
- replicate - repeats the last/extreme value.
- reflect - reflects the time series.
padding_local_method: Padding method to be used prior to each convolution over saliency map. Possible values: constant | replicate | reflect. Default value: replicate.
- constant - padding with constant 0.
- replicate - repeats the last/extreme value.
- reflect - reflects the time series.
padding_amp_side: Whether to pad the amplitudes on both sides or only on one side. Possible values: bilateral | left | right.
n_est_points: Number of estimated points padded to the end of the sequence.
n_grad_points: Number of points used for the gradient estimation of the additional points padded to the end of the sequence. The paper sets this value to 5.

Initialized outlier detector example:

from alibi_detect.od import SpectralResidual

od = SpectralResidual(
    threshold=1.,
    window_amp=20,
    window_local=20,
    padding_amp_method='reflect',
    padding_local_method='reflect',
    padding_amp_side='bilateral',
    n_est_points=10,
    n_grad_points=5
)

It is often hard to find a good threshold value. If we have a time series containing both normal and outlier data and we know approximately the percentage of normal data in the time series, we can infer a suitable threshold:

od.infer_threshold(
    X,
    t=t,  # array with timesteps, assumes dt=1 between observations if omitted
    threshold_perc=95
)

Detect

We detect outliers by simply calling predict on a time series X to compute the outlier scores and flag the anomalies. We can also return the instance (timestep) level outlier score by setting return_instance_score to True.

is_outlier: boolean whether instances are above the threshold and therefore outlier instances. The array is of shape (timesteps,).
instance_score: contains instance level scores if return_instance_score equals True.

preds = od.predict(
    X,
    t=t,  # array with timesteps, assumes dt=1 between observations if omitted
    return_instance_score=True
)

Examples

Time series outlier detection with Spectral Residuals on synthetic data

Sequence-to-Sequence (Seq2Seq)

source

Sequence-to-Sequence (Seq2Seq)

Overview

The Sequence-to-Sequence (Seq2Seq) outlier detector consists of 2 main building blocks: an encoder and a decoder. The encoder consists of a Bidirectional LSTM which processes the input sequence and initializes the decoder. The LSTM decoder then makes sequential predictions for the output sequence. In our case, the decoder aims to reconstruct the input sequence. If the input data cannot be reconstructed well, the reconstruction error is high and the data can be flagged as an outlier. The reconstruction error is measured as the mean squared error (MSE) between the input and the reconstructed instance.

Since even for normal data the reconstruction error can be state-dependent, we add an outlier threshold estimator network to the Seq2Seq model. This network takes in the hidden state of the decoder at each timestep and predicts the estimated reconstruction error for normal data. As a result, the outlier threshold is not static and becomes a function of the model state. This is similar to Park et al. (2017), but while they train the threshold estimator separately from the Seq2Seq model with a Support-Vector Regressor, we train a neural net regression network end-to-end with the Seq2Seq model.

The detector is first trained on a batch of unlabeled, but normal (inlier) data. Unsupervised training is desireable since labeled data is often scarce. The Seq2Seq outlier detector is suitable for both univariate and multivariate time series.

Usage

Initialize

Parameters:

n_features: number of features in the time series.
seq_len: sequence length fed into the Seq2Seq model.
threshold: threshold used for outlier detection. Can be a float or feature-wise array.
seq2seq: optionally pass an already defined or pretrained Seq2Seq model to the outlier detector as a tf.keras.Model.
threshold_net: optionally pass the layers for the threshold estimation network wrapped in a tf.keras.Sequential instance. Example:

threshold_net = tf.keras.Sequential(
    [
        InputLayer(input_shape=(seq_len, latent_dim)),
        Dense(64, activation=tf.nn.relu),
        Dense(64, activation=tf.nn.relu),
    ])

latent_dim: latent dimension of the encoder and decoder.
output_activation: activation used in the Dense output layer of the decoder.
beta: weight on the threshold estimation mean-squared error (MSE) loss term.

Initialized outlier detector example:

from alibi_detect.od import OutlierSeq2Seq

n_features = 2
seq_len = 50

od = OutlierSeq2Seq(n_features,
                    seq_len,
                    threshold=None,
                    threshold_net=threshold_net,
                    latent_dim=100)

Fit

We then need to train the outlier detector. The following parameters can be specified:

X: univariate or multivariate time series array with preferably normal data used for training. Shape equals (batch, n_features) or (batch, seq_len, n_features).
loss_fn: loss function used for training. Defaults to the MSE loss.
optimizer: optimizer used for training. Defaults to Adam with learning rate 1e-3.
epochs: number of training epochs.
batch_size: batch size used during training.
verbose: boolean whether to print training progress.
log_metric: additional metrics whose progress will be displayed if verbose equals True.

od.fit(X_train, epochs=20)

It is often hard to find a good threshold value. If we have a batch of normal and outlier data and we know approximately the percentage of normal data in the batch, we can infer a suitable threshold. We can either set the threshold over both features combined or determine a feature-wise threshold. Here we opt for the feature-wise threshold. This is for instance useful when different features have different variance or sensitivity to outliers. The snippet assumes there are about 5% outliers in the first feature and 10% in the second:

od.infer_threshold(X, threshold_perc=np.array([95, 90]))

Detect

We detect outliers by simply calling predict on a batch of instances X. Detection can be customized via the following parameters:

outlier_type: either 'instance' or 'feature'. If the outlier type equals 'instance', the outlier score at the instance level will be used to classify the instance as an outlier or not. If 'feature' is selected, outlier detection happens at the feature level. It is important to distinguish 2 use cases:
- X has shape (batch, n_features):
  - There are batch instances with n_features features per instance.
- X has shape (batch, seq_len, n_features)
  - Now there are batch instances with seq_len x n_features features per instance.
outlier_perc: percentage of the sorted (descending) feature level outlier scores. We might for instance want to flag a multivariate time series as an outlier at a specific timestamp if at least 75% of the feature values are on average above the threshold. In this case, we set outlier_perc to 75. The default value is 100 (using all the features).
return_feature_score: boolean whether to return the feature level outlier scores.
return_instance_score: boolean whether to return the instance level outlier scores.

is_outlier: boolean whether instances or features are above the threshold and therefore outliers. If outlier_type equals 'instance', then the array is of shape (batch,). If it equals 'feature', then the array is of shape (batch, seq_len, n_features) or (batch, n_features), depending on the shape of X.
feature_score: contains feature level scores if return_feature_score equals True.
instance_score: contains instance level scores if return_instance_score equals True.

preds = od.predict(X,
                   outlier_type='instance',
                   outlier_perc=100,
                   return_feature_score=True,
                   return_instance_score=True)

Examples

Time series outlier detection with Seq2Seq models on synthetic data

Seq2Seq time series outlier detection on ECG data