Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
The Prophet outlier detector uses the Prophet time series forecasting package explained in this excellent paper. The underlying Prophet model is a decomposable univariate time series model combining trend, seasonality and holiday effects. The model forecast also includes an uncertainty interval around the estimated trend component using the MAP estimate of the extrapolated model. Alternatively, full Bayesian inference can be done at the expense of increased compute. The upper and lower values of the uncertainty interval can then be used as outlier thresholds for each point in time. First, the distance from the observed value to the nearest uncertainty boundary (upper or lower) is computed. If the observation is within the boundaries, the outlier score equals the negative distance. As a result, the outlier score is the lowest when the observation equals the model prediction. If the observation is outside of the boundaries, the score equals the distance measure and the observation is flagged as an outlier. One of the main drawbacks of the method however is that you need to refit the model as new data comes in. This is undesirable for applications with high throughput and real-time detection.
Note
To use this detector, first install Prophet by running:
This will install Prophet, and its major dependency PyStan. PyStan is currently only partly supported on Windows. If this detector is to be used on a Windows system, it is recommended to manually install (and test) PyStan before running the command above.
The example uses a weather time series dataset recorded by the Max-Planck-Institute for Biogeochemistry. The dataset contains 14 different features such as air temperature, atmospheric pressure, and humidity. These were collected every 10 minutes, beginning in 2003. Like the TensorFlow time-series tutorial, we only use data collected between 2009 and 2016.
Select subset to test Prophet model on:
Prophet model expects a DataFrame with 2 columns: one named ds
with the timestamps and one named y
with the time series to be evaluated. We will just look at the temperature data:
We train an outlier detector from scratch:
Please check out the documentation as well as the original Prophet documentation on how to customize the Prophet-based outlier detector and add seasonalities, holidays, opt for a saturating logistic growth model or apply parameter regularization.
Define the test data. It is important that the timestamps of the test data follow the training data. We check this below by comparing the first few rows of the test DataFrame with the last few of the training DataFrame:
Predict outliers on test data:
We can first visualize our predictions with Prophet's built in plotting functionality. This also allows us to include historical predictions:
We can also plot the breakdown of the different components in the forecast. Since we did not do full Bayesian inference with mcmc_samples
, the uncertaintly intervals of the forecast are determined by the MAP estimate of the extrapolated trend.
It is clear that the further we predict in the future, the wider the uncertainty intervals which determine the outlier threshold.
Let's overlay the actual data with the upper and lower outlier thresholds predictions and check where we predicted outliers:
Outlier scores and predictions:
The outlier scores naturally trend down as uncertainty increases when we predict further in the future.
Let's look at some individual outliers:
Isolation forests (IF) are tree based models specifically used for outlier detection. The IF isolates observations by randomly selecting a feature and then randomly selecting a split value between the maximum and minimum values of the selected feature. The number of splittings required to isolate a sample is equivalent to the path length from the root node to the terminating node. This path length, averaged over a forest of random trees, is a measure of normality and is used to define an anomaly score. Outliers can typically be isolated quicker, leading to shorter paths.
The outlier detector needs to detect computer network intrusions using TCP dump data for a local-area network (LAN) simulating a typical U.S. Air Force LAN. A connection is a sequence of TCP packets starting and ending at some well defined times, between which data flows to and from a source IP address to a target IP address under some well defined protocol. Each connection is labeled as either normal, or as an attack.
There are 4 types of attacks in the dataset:
DOS: denial-of-service, e.g. syn flood;
R2L: unauthorized access from a remote machine, e.g. guessing password;
U2R: unauthorized access to local superuser (root) privileges;
probing: surveillance and other probing, e.g., port scanning.
The dataset contains about 5 million connection records.
There are 3 types of features:
basic features of individual connections, e.g. duration of connection
content features within a connection, e.g. number of failed log in attempts
traffic features within a 2 second window, e.g. number of connections to the same host as the current connection
This notebook requires the seaborn
package for visualization which can be installed via pip
:
We only keep a number of continuous (18 out of 41) features.
Assume that a model is trained on normal instances of the dataset (not outliers) and standardization is applied:
Apply standardization:
We train an outlier detector from scratch:
The warning tells us we still need to set the outlier threshold. This can be done with the infer_threshold
method. We need to pass a batch of instances and specify what percentage of those we consider to be normal via threshold_perc
. Let's assume we have some data which we know contains around 5% outliers. The percentage of outliers can be set with perc_outlier
in the create_outlier_batch
function.
Let's save the outlier detector with updated threshold:
We now generate a batch of data with 10% outliers and detect the outliers in the batch.
Predict outliers:
F1 score and confusion matrix:
Plot instance level outlier scores vs. the outlier threshold:
We can see that the isolation forest does not do a good job at detecting 1 type of outliers with an outlier score around 0. This makes inferring a good threshold without explicit knowledge about the outliers hard. Setting the threshold just below 0 would lead to significantly better detector performance for the outliers in the dataset. This is also reflected by the ROC curve:
The Auto-Encoder (AE) outlier detector is first trained on a batch of unlabeled, but normal (inlier) data. Unsupervised training is desireable since labeled data is often scarce. The AE detector tries to reconstruct the input it receives. If the input data cannot be reconstructed well, the reconstruction error is high and the data can be flagged as an outlier. The reconstruction error is measured as the mean squared error (MSE) between the input and the reconstructed instance.
CIFAR10 consists of 60,000 32 by 32 RGB images equally distributed over 10 classes.
The pretrained outlier and adversarial detectors used in the example notebooks can be found here. You can use the built-in fetch_detector
function which saves the pre-trained models in a local directory filepath
and loads the detector. Alternatively, you can train a detector from scratch:
We perturb CIFAR images by adding random noise to patches (masks) of the image. For each mask size in n_mask_sizes
, sample n_masks
and apply those to each of the n_imgs
images. Then we predict outliers on the masked instances:
Define masks and get images:
Calculate instance level outlier scores:
Reconstruction of masked images and outlier scores per channel:
Visualize:
The sensitivity of the outlier detector can not only be controlled via the threshold
, but also by selecting the percentage of the features used for the instance level outlier score computation. For instance, we might want to flag outliers if 40% of the features (pixels for images) have an average outlier score above the threshold. This is possible via the outlier_perc
argument in the predict
function. It specifies the percentage of the features that are used for outlier detection, sorted in descending outlier score order.
Visualize outlier scores vs. mask sizes and percentage of features used:
Finding good threshold values can be tricky since they are typically not easy to interpret. The infer_threshold
method helps finding a sensible value. We need to pass a batch of instances X
and specify what percentage of those we consider to be normal via threshold_perc
.
The AEGMM method follows the Deep Autoencoding Gaussian Mixture Model for Unsupervised Anomaly Detection ICLR 2018 paper. The encoder compresses the data while the reconstructed instances generated by the decoder are used to create additional features based on the reconstruction error between the input and the reconstructions. These features are combined with encodings and fed into a Gaussian Mixture Model (GMM). Training of the AEGMM model is unsupervised on normal (inlier) data. The sample energy of the GMM can then be used to determine whether an instance is an outlier (high sample energy) or not (low sample energy). VAEGMM on the other hand uses a variational autoencoder instead of a plain autoencoder.
The outlier detector needs to detect computer network intrusions using TCP dump data for a local-area network (LAN) simulating a typical U.S. Air Force LAN. A connection is a sequence of TCP packets starting and ending at some well defined times, between which data flows to and from a source IP address to a target IP address under some well defined protocol. Each connection is labeled as either normal, or as an attack.
There are 4 types of attacks in the dataset:
DOS: denial-of-service, e.g. syn flood;
R2L: unauthorized access from a remote machine, e.g. guessing password;
U2R: unauthorized access to local superuser (root) privileges;
probing: surveillance and other probing, e.g., port scanning.
The dataset contains about 5 million connection records.
There are 3 types of features:
basic features of individual connections, e.g. duration of connection
content features within a connection, e.g. number of failed log in attempts
traffic features within a 2 second window, e.g. number of connections to the same host as the current connection
This notebook requires the seaborn
package for visualization which can be installed via pip
:
We only keep a number of continuous (18 out of 41) features.
Assume that a model is trained on normal instances of the dataset (not outliers) and standardization is applied:
Apply standardization:
The pretrained outlier and adversarial detectors used in the example notebooks can be found here. You can use the built-in fetch_detector
function which saves the pre-trained models in a local directory filepath
and loads the detector. Alternatively, you can train a detector from scratch:
The warning tells us we still need to set the outlier threshold. This can be done with the infer_threshold
method. We need to pass a batch of instances and specify what percentage of those we consider to be normal via threshold_perc
. Let's assume we have some data which we know contains around 5% outliers. The percentage of outliers can be set with perc_outlier
in the create_outlier_batch
function.
Save outlier detector with updated threshold:
We now generate a batch of data with 10% outliers and detect the outliers in the batch.
Predict outliers:
F1 score and confusion matrix:
Plot instance level outlier scores vs. the outlier threshold:
We can also plot the ROC curve for the outlier scores of the detector:
We can visualize the encodings of the instances in the latent space and the features derived from the instance reconstructions by the decoder. The encodings and features are then fed into the GMM density network.
A lot of the outliers are already separated well in the latent space.
We can again instantiate the pretrained VAEGMM detector from the Google Cloud Bucket. You can use the built-in fetch_detector
function which saves the pre-trained models in a local directory filepath
and loads the detector. Alternatively, you can train a detector from scratch:
Need to infer the threshold again:
Save outlier detector with updated threshold:
Predict:
F1 score and confusion matrix:
Plot instance level outlier scores vs. the outlier threshold:
You can zoom in by adjusting the min and max values in ylim
. We can also compare the VAEGMM ROC curve with AEGMM:
The outlier detector described by Ren et al. (2019) in Likelihood Ratios for Out-of-Distribution Detection uses the likelihood ratio between 2 generative models as the outlier score. One model is trained on the original data while the other is trained on a perturbed version of the dataset. This is based on the observation that the likelihood score for an instance under a generative model can be heavily affected by population level background statistics. The second generative model is therefore trained to capture the background statistics still present in the perturbed data while the semantic features have been erased by the perturbations.
The perturbations are added using an independent and identical Bernoulli distribution with rate $\mu$ which substitutes a feature with one of the other possible feature values with equal probability. For images, this means changing a pixel with a different pixel randomly sampled within the $0$ to $255$ pixel range.
The generative model used in the example is a PixelCNN++, adapted from the official TensorFlow Probability implementation, and available as a standalone model in from alibi_detect.models.tensorflow import PixelCNN
.
The training set Fashion-MNIST consists of 60,000 28 by 28 grayscale images distributed over 10 classes. The classes represent items of clothing such as shirts or trousers. At test time, we want to distinguish the Fashion-MNIST test set from MNIST, which represents 28 by 28 grayscale numbers from 0 to 9.
This notebook requires the seaborn
package for visualization which can be installed via pip
:
The in-distribution dataset is Fashion-MNIST and the out-of-distribution dataset we'd like to detect is MNIST.
We now need to define our generative model. This is not necessary if the pretrained detector is later loaded from the Google Bucket.
Key PixelCNN++ arguments in a nutshell:
num_resnet: number of layers (Fig.2 PixelCNN) within each hierarchical block (Fig.2 PixelCNN++).
num_hierarchies: number of blocks separated by expansions or contractions of dimensions. See Fig.2 PixelCNN++.
num_filters: number of convolutional filters.
num_logistic_mix: number of components in the logistic mixture distribution.
receptive_field_dims: height and width in pixels of the receptive field above and to the left of a given pixel.
Optionally, a different model can be passed to the detector with argument model_background. The Likelihood Ratio paper mentions that additional $L2$-regularization (l2_weight) for the background model could improve detection performance.
We can again either fetch the pretrained detector from a Google Cloud Bucket or train one from scratch:
We can load our saved detector again by defining the PixelCNN architectures for the semantic and background models as well as providing the shape of the input data:
Let's sample some instances from the semantic model to check how good our generative model is:
Most of the instances look like they represent the dataset well. When we do the same thing for our background model, we see that there is some background noise injected:
Let's compare the log likelihoods of the inliers vs. the outlier data under the semantic and background models. Although MNIST data looks very distinct from Fashion-MNIST, the generative model does not distinguish well between the 2 datasets as shown by the histograms of the log likelihoods:
This is due to the dominance of the background which is similar (basically lots of $0$'s for both datasets). If we however take the likelihood ratio, the MNIST data are detected as outliers. And this is exactly what the outlier detector does as well:
We follow the same procedure with the outlier detector. First we need to set an outlier threshold with infer_threshold
. We need to pass a batch of instances and specify what percentage of those we consider to be normal via threshold_perc
. Let's assume we have a small batch of data with roughly $50$% outliers but we don't know exactly which ones.
Let's save the outlier detector with updated threshold:
Let's now predict outliers on the combined Fashion-MNIST and MNIST datasets:
F1 score, accuracy, precision, recall and confusion matrix:
We can also plot the ROC curve based on the instance level outlier scores and compare it with the likelihood of only the semantic model:
To understand why the likelihood ratio works to detect outliers but the raw log likelihoods don't, it is helpful to look at the pixel-wise log likelihoods of both the semantic and background models.
Plot in-distribution instances:
It is clear that both the semantic and background model attach high probabilities to the background pixels. This effect is cancelled out in the likelihood ratio in the last column. The same applies to the out-of-distribution instances:
The outlier detector described by in uses the likelihood ratio between 2 generative models as the outlier score. One model is trained on the original data while the other is trained on a perturbed version of the dataset. This is based on the observation that the likelihood score for an instance under a generative model can be heavily affected by population level background statistics. The second generative model is therefore trained to capture the background statistics still present in the perturbed data while the semantic features have been erased by the perturbations.
The perturbations are added using an independent and identical Bernoulli distribution with rate $\mu$ which substitutes a feature with one of the other possible feature values with equal probability. Each feature in the genome dataset can take 4 values (one of the ACGT nucleobases). This means that a perturbed feature is swapped with one of the other nucleobases. The generative model used in the example is a simple LSTM network.
The bacteria genomics dataset for out-of-distribution detection was released as part of the paper. From the original TL;DR: The dataset contains genomic sequences of 250 base pairs from 10 in-distribution bacteria classes for training, 60 OOD bacteria classes for validation, and another 60 different OOD bacteria classes for test. There are respectively 1, 7 and again 7 million sequences in the training, validation and test sets. For detailed info on the dataset check the .
This notebook requires the seaborn
package for visualization which can be installed via pip
:
X represents the genome sequences and y whether they are outliers ($1$) or not ($0$).
There are no outliers in the training set and a majority of outliers (compared to the training data) in the validation and test sets:
We need to define a generative model which models the genome sequences. We follow the paper and opt for a simple LSTM. Note that we don't actually need to define the model below if we simply load the pretrained detector later on:
We also need to define our loss function which we can utilize to evaluate the log-likelihood for the outlier detector:
Let's compare the log likelihoods of the inliers vs. the outlier test set data under the semantic and background models. We randomly sample $100,000$ instances from both distributions since the full test set contains $7,000,000$ genomic sequences. The histograms show that the generative model does not distinguish well between inliers and outliers.
This is because of the background-effect which is in this case the GC-content in the genomic sequences. This effect is partially reduced when taking the likelihood ratio:
We follow the same procedure with the outlier detector. First we need to set an outlier threshold with infer_threshold
. We need to pass a batch of instances and specify what percentage of those we consider to be normal via threshold_perc
. Let's assume we have a small batch of data with roughly $30$% outliers but we don't know exactly which ones.
Let's save the outlier detector with updated threshold:
Let'spredict outliers on a sample of the test set:
F1 score, accuracy, precision, recall and confusion matrix:
We can also plot the ROC curve based on the instance level outlier scores:
The (Seq2Seq) outlier detector consists of 2 main building blocks: an encoder and a decoder. The encoder consists of a which processes the input sequence and initializes the decoder. The LSTM decoder then makes sequential predictions for the output sequence. In our case, the decoder aims to reconstruct the input sequence. If the input data cannot be reconstructed well, the reconstruction error is high and the data can be flagged as an outlier. The reconstruction error is measured as the mean squared error (MSE) between the input and the reconstructed instance.
Since even for normal data the reconstruction error can be state-dependent, we add an outlier threshold estimator network to the Seq2Seq model. This network takes in the hidden state of the decoder at each timestep and predicts the estimated reconstruction error for normal data. As a result, the outlier threshold is not static and becomes a function of the model state. This is similar to , but while they train the threshold estimator separately from the Seq2Seq model with a Support-Vector Regressor, we train a neural net regression network end-to-end with the Seq2Seq model.
The detector is first trained on a batch of unlabeled, but normal (inlier) data. Unsupervised training is desireable since labeled data is often scarce. The Seq2Seq outlier detector is suitable for both univariate and multivariate time series.
We test the outlier detector on a synthetic dataset generated with the package. It allows you to generate a wide range of time series (e.g. pseudo-periodic, autoregressive or Gaussian Process generated signals) and noise types (white or red noise). It can be installed as follows:
Additionally, this notebook requires the seaborn
package for visualization which can be installed via pip
:
Visualize:
We still need to set the outlier threshold. This can be done with the infer_threshold
method. We need to pass a time series of instances and specify what percentage of those we consider to be normal via threshold_perc
. First we create outliers by injecting noise in the time series via inject_outlier_ts
. The noise can be regulated via the percentage of outliers (perc_outlier
), the strength of the perturbation (n_std
) and the minimum size of the noise perturbation (min_std
). Let's assume we have some data which we know contains around 10% outliers in either of the features:
Visualize outlier data used to determine the threshold:
Let's infer the threshold. The inject_outlier_ts
method distributes perturbations evenly across features. As a result, each feature contains about 5% outliers. We can either set the threshold over both features combined or determine a feature-wise threshold. Here we opt for the feature-wise threshold. This is for instance useful when different features have different variance or sensitivity to outliers. We also manually decrease the threshold a bit to increase the sensitivity of our detector:
Let's save the outlier detector with the updated threshold:
We can load the same detector via load_detector
:
Generate the outliers to detect:
Predict outliers:
F1 score, accuracy, recall and confusion matrix:
Plot the feature-wise outlier scores of the time series for each timestep vs. the outlier threshold:
We can also plot the ROC curve using the instance level outlier scores:
The Mahalanobis online outlier detector aims to predict anomalies in tabular data. The algorithm calculates an outlier score, which is a measure of distance from the center of the features distribution (). If this outlier score is higher than a user-defined threshold, the observation is flagged as an outlier. The algorithm is online, which means that it starts without knowledge about the distribution of the features and learns as requests arrive. Consequently you should expect the output to be bad at the start and to improve over time.
The outlier detector needs to detect computer network intrusions using TCP dump data for a local-area network (LAN) simulating a typical U.S. Air Force LAN. A connection is a sequence of TCP packets starting and ending at some well defined times, between which data flows to and from a source IP address to a target IP address under some well defined protocol. Each connection is labeled as either normal, or as an attack.
There are 4 types of attacks in the dataset:
DOS: denial-of-service, e.g. syn flood;
R2L: unauthorized access from a remote machine, e.g. guessing password;
U2R: unauthorized access to local superuser (root) privileges;
probing: surveillance and other probing, e.g., port scanning.
The dataset contains about 5 million connection records.
There are 3 types of features:
basic features of individual connections, e.g. duration of connection
content features within a connection, e.g. number of failed log in attempts
traffic features within a 2 second window, e.g. number of connections to the same host as the current connection
This notebook requires the seaborn
package for visualization which can be installed via pip
:
We only keep a number of continuous (18 out of 41) features.
Assume that a machine learning model is trained on normal instances of the dataset (not outliers) and standardization is applied:
We train an outlier detector from scratch.
Be aware that Mahalanobis
is an online, stateful outlier detector. Saving or loading a Mahalanobis detector therefore also saves and loads the state of the detector. This allows the user to warm up the detector before deploying it into production.
The warning tells us we still need to set the outlier threshold. This can be done with the infer_threshold
method. We need to pass a batch of instances and specify what percentage of those we consider to be normal via threshold_perc
. Let's assume we have some data which we know contains around 5% outliers. The percentage of outliers can be set with perc_outlier
in the create_outlier_batch
function.
We now generate a batch of data with 10% outliers, standardize those with the mean
and stdev
values obtained from the normal data (inliers) and detect the outliers in the batch.
Predict outliers:
We can now save the warmed up outlier detector:
F1 score and confusion matrix:
Plot instance level outlier scores vs. the outlier threshold:
We can also plot the ROC curve for the outlier scores of the detector:
Create a dictionary with as keys the categorical columns and values the number of categories for each variable in the dataset. This dictionary will later be used in the fit
step of the outlier detector.
Fit an ordinal encoder on the categorical data:
Combine scaled numerical and ordinal features. X_fit
will be used to infer distances between categorical features later. To make it easy, we will already transform the whole dataset, including the outliers that need to be detected later. This is for illustrative purposes:
We use the same threshold as for the continuous data. This will likely not result in optimal performance. Alternatively, you can infer the threshold again.
Set fit
parameters:
Apply fit
method to find numerical values for categorical variables:
The numerical values for the categorical features are stored in the attribute od.d_abs
. This is a dictionary with as keys the columns for the categorical features and as values the numerical equivalent of the category:
Another option would be to set d_type
to 'mvdm'
and y
to kddcup.target
to infer the numerical values for categorical variables from the model labels (or alternatively the predictions).
Generate batch of data with 10% outliers:
Preprocess the outlier batch:
Predict outliers:
F1 score and confusion matrix:
Plot instance level outlier scores vs. the outlier threshold:
Since we will apply one-hot encoding (OHE) on the categorical variables, we convert cat_vars_ord
from the ordinal to OHE format. alibi_detect.utils.mapping
contains utility functions to do this. The keys in cat_vars_ohe
now represent the first column index for each one-hot encoded categorical variable. This dictionary will later be used in the counterfactual explanation.
Fit a one-hot encoder on the categorical data:
Transform X_fit
to OHE:
Initialize:
Apply fit method:
Transform outlier batch to OHE:
Predict outliers:
F1 score and confusion matrix:
Plot instance level outlier scores vs. the outlier threshold:
We can again either fetch the pretrained detector from a or train one from scratch:
Define number of sampled points and the type of simulated time series. We use to generate sinusoidal signals with noise.
So far we only tracked continuous variables. We can however also include categorical variables. The fit
step first computes pairwise distances between the categories of each categorical variable. The pairwise distances are based on either the model predictions (MVDM method) or the context provided by the other variables in the dataset (ABDM method). For MVDM, we use the difference between the conditional model prediction probabilities of each category. This method is based on the Modified Value Difference Metric (MVDM) by . ABDM stands for Association-Based Distance Metric, a categorical distance measure introduced by . ABDM infers context from the presence of other variables in the data and computes a dissimilarity measure based on the Kullback-Leibler divergence. Both methods can also be combined as ABDM-MVDM. We can then apply multidimensional scaling to project the pairwise distances into Euclidean space.
The Variational Auto-Encoder (VAE) outlier detector is first trained on a batch of unlabeled, but normal (inlier) data. Unsupervised training is desireable since labeled data is often scarce. The VAE detector tries to reconstruct the input it receives. If the input data cannot be reconstructed well, the reconstruction error is high and the data can be flagged as an outlier. The reconstruction error is measured as the mean squared error (MSE) between the input and the reconstructed instance.
The instances contain a person's characteristics like age, marital status or education while the label represents whether the person makes more or less than $50k per year. The dataset consists of a mixture of numerical and categorical features. It is originally not an outlier detection dataset so we will inject artificial outliers. It is fetched using the Alibi library, which can be installed with pip. We also use seaborn
to visualize the data:
The fetch_adult
function returns a Bunch
object containing the features, the targets, the feature names and a mapping of the categories in each categorical variable.
Shuffle data:
Reorganize data so categorical features come first, remove some features and adjust feature_names
and category_map
accordingly:
Normalize numerical features or scale numerical between -1 and 1:
Fit OHE to categorical variables:
Combine numerical and categorical data:
Define train, validation (to find outlier threshold) and test set:
Inject outliers in the numerical features. First we need to know the features for each kind:
Now we can add outliers to the validation (or threshold) and test sets. For the numerical data, we need to specify the numerical columns (cols
), the percentage of outliers (perc_outlier
), the strength (n_std
) and the minimum size of the perturbation (min_std
). The outliers are distributed evenly across the numerical features:
Let's inspect an instance that was changed:
Same thing for the test set:
OHE to train, threshold and outlier sets:
The pretrained outlier and adversarial detectors used in the example notebooks can be found here. You can use the built-in fetch_detector
function which saves the pre-trained models in a local directory filepath
and loads the detector. Alternatively, you can train a detector from scratch:
The warning tells us we still need to set the outlier threshold. This can be done with the infer_threshold
method. We need to pass a batch of instances and specify what percentage of those we consider to be normal via threshold_perc
.
Let’s save the outlier detector with updated threshold:
F1 score and confusion matrix:
Plot instance level outlier scores vs. the outlier threshold:
The Spectral Residual outlier detector is based on the paper Time-Series Anomaly Detection Service at Microsoft and is suitable for unsupervised online anomaly detection in univariate time series data. The algorithm first computes the Fourier Transform of the original data. Then it computes the spectral residual of the log amplitude of the transformed signal before applying the Inverse Fourier Transform to map the sequence back from the frequency to the time domain. This sequence is called the saliency map. The anomaly score is then computed as the relative difference between the saliency map values and their moving averages. If this score is above a threshold, the value at a specific timestep is flagged as an outlier. For more details, please check out the paper.
We test the outlier detector on a synthetic dataset generated with the TimeSynth package. It allows you to generate a wide range of time series (e.g. pseudo-periodic, autoregressive or Gaussian Process generated signals) and noise types (white or red noise). It can be installed as follows:
Additionally, this notebook requires the seaborn
package for visualization which can be installed via pip
:
Define number of sampled points and the type of simulated time series. We use TimeSynth to generate a sinusoidal signal with Gaussian noise.
We can inject noise in the time series via inject_outlier_ts
. The noise can be regulated via the percentage of outliers (perc_outlier
), the strength of the perturbation (n_std
) and the minimum size of the noise perturbation (min_std
):
Visualize part of the original and perturbed time series:
Perturbed data:
Note that for the local convolution we pad the signal internally only on the left, following the paper's recommendation.
The warning tells us that we need to set the outlier threshold. This can be done with the infer_threshold
method. We need to pass a batch of instances and specify what percentage of those we consider to be normal via threshold_perc
. Let's assume we have some data which we know contains around 10% outliers:
Let's infer the threshold:
Let's save the outlier detector with the updated threshold:
We can load the same detector via load_detector
:
Predict outliers:
F1 score, accuracy, recall and confusion matrix:
Plot the outlier scores of the time series vs. the outlier threshold. :
Let's zoom in on a smaller time scale to have a clear picture:
The Sequence-to-Sequence (Seq2Seq) outlier detector consists of 2 main building blocks: an encoder and a decoder. The encoder consists of a Bidirectional LSTM which processes the input sequence and initializes the decoder. The LSTM decoder then makes sequential predictions for the output sequence. In our case, the decoder aims to reconstruct the input sequence. If the input data cannot be reconstructed well, the reconstruction error is high and the data can be flagged as an outlier. The reconstruction error is measured as the mean squared error (MSE) between the input and the reconstructed instance.
Since even for normal data the reconstruction error can be state-dependent, we add an outlier threshold estimator network to the Seq2Seq model. This network takes in the hidden state of the decoder at each timestep and predicts the estimated reconstruction error for normal data. As a result, the outlier threshold is not static and becomes a function of the model state. This is similar to Park et al. (2017), but while they train the threshold estimator separately from the Seq2Seq model with a Support-Vector Regressor, we train a neural net regression network end-to-end with the Seq2Seq model.
The detector is first trained on a batch of unlabeled, but normal (inlier) data. Unsupervised training is desireable since labeled data is often scarce. The Seq2Seq outlier detector is suitable for both univariate and multivariate time series.
The outlier detector needs to spot anomalies in electrocardiograms (ECG's). The dataset contains 5000 ECG's, originally obtained from Physionet under the name BIDMC Congestive Heart Failure Database(chfdb), record chf07. The data has been pre-processed in 2 steps: first each heartbeat is extracted, and then each beat is made equal length via interpolation. The data is labeled and contains 5 classes. The first class which contains almost 60% of the observations is seen as normal while the others are outliers. The detector is trained on heartbeats from the first class and needs to flag the other classes as anomalies.
This notebook requires the seaborn
package for visualization which can be installed via pip
:
Flip train and test data because there are only 500 ECG's in the original training set and 4500 in the test set:
Since we treat the first class as the normal, inlier data and the rest of X_train as outliers, we need to adjust the training (inlier) data and the labels of the test set.
Some of the outliers in X_train are used in combination with some of the inlier instances to infer the threshold level:
Apply min-max scaling between 0 and 1 to the observations using the inlier data:
Reshape the observations to (batch size, sequence length, features) for the detector:
We can now visualize scaled instances from each class:
The pretrained outlier and adversarial detectors used in the example notebooks can be found here. You can use the built-in fetch_detector
function which saves the pre-trained models in a local directory filepath
and loads the detector. Alternatively, you can train a detector from scratch:
Let's inspect how well the sequence-to-sequence model can predict the ECG's of the inlier and outlier classes. The predictions in the charts below are made on ECG's from the test set:
It is clear that the model can reconstruct the inlier class but struggles with the outliers.
If we trained a model from scratch, the warning thrown when we initialized the model tells us that we need to set the outlier threshold. This can be done with the infer_threshold
method. We need to pass a time series of instances and specify what percentage of those we consider to be normal via threshold_perc
, equal to the percentage of Class 1 in X_threshold. The outlier_perc
parameter defines the percentage of features used to define the outlier threshold. In this example, the number of features considered per instance equals 140 (1 for each timestep). We set the outlier_perc
at 95, which means that we will use the 95% features with highest reconstruction error, adjusted for by the threshold estimate.
Let's save the outlier detector with the updated threshold:
We can load the same detector via load_detector
:
F1 score, accuracy, recall and confusion matrix:
We can also plot the ROC curve based on the instance level outlier scores:
The Variational Auto-Encoder (VAE) outlier detector is first trained on a batch of unlabeled, but normal (inlier) data. Unsupervised training is desireable since labeled data is often scarce. The VAE detector tries to reconstruct the input it receives. If the input data cannot be reconstructed well, the reconstruction error is high and the data can be flagged as an outlier. The reconstruction error is either measured as the mean squared error (MSE) between the input and the reconstructed instance or as the probability that both the input and the reconstructed instance are generated by the same process.
The outlier detector needs to detect computer network intrusions using TCP dump data for a local-area network (LAN) simulating a typical U.S. Air Force LAN. A connection is a sequence of TCP packets starting and ending at some well defined times, between which data flows to and from a source IP address to a target IP address under some well defined protocol. Each connection is labeled as either normal, or as an attack.
There are 4 types of attacks in the dataset:
DOS: denial-of-service, e.g. syn flood;
R2L: unauthorized access from a remote machine, e.g. guessing password;
U2R: unauthorized access to local superuser (root) privileges;
probing: surveillance and other probing, e.g., port scanning.
The dataset contains about 5 million connection records.
There are 3 types of features:
basic features of individual connections, e.g. duration of connection
content features within a connection, e.g. number of failed log in attempts
traffic features within a 2 second window, e.g. number of connections to the same host as the current connection
This notebook requires the seaborn
package for visualization which can be installed via pip
:
We only keep a number of continuous (18 out of 41) features.
Assume that a model is trained on normal instances of the dataset (not outliers) and standardization is applied:
Apply standardization:
The pretrained outlier and adversarial detectors used in the example notebooks can be found here. You can use the built-in fetch_detector
function which saves the pre-trained models in a local directory filepath
and loads the detector. Alternatively, you can train a detector from scratch:
The warning tells us we still need to set the outlier threshold. This can be done with the infer_threshold
method. We need to pass a batch of instances and specify what percentage of those we consider to be normal via threshold_perc
. Let's assume we have some data which we know contains around 5% outliers. The percentage of outliers can be set with perc_outlier
in the create_outlier_batch
function.
We could have also inferred the threshold from the normal training data by setting threshold_perc
e.g. at 99 and adding a bit of margin on top of the inferred threshold. Let's save the outlier detector with updated threshold:
We now generate a batch of data with 10% outliers and detect the outliers in the batch.
Predict outliers:
F1 score and confusion matrix:
Plot instance level outlier scores vs. the outlier threshold:
We can clearly see that some outliers are very easy to detect while others have outlier scores closer to the normal data. We can also plot the ROC curve for the outlier scores of the detector:
We can now take a closer look at some of the individual predictions on X_outlier
.
The srv_count
feature is responsible for a lot of the displayed outliers.
The Variational Auto-Encoder (VAE) outlier detector is first trained on a batch of unlabeled, but normal (inlier) data. Unsupervised training is desireable since labeled data is often scarce. The VAE detector tries to reconstruct the input it receives. If the input data cannot be reconstructed well, the reconstruction error is high and the data can be flagged as an outlier. The reconstruction error is either measured as the mean squared error (MSE) between the input and the reconstructed instance or as the probability that both the input and the reconstructed instance are generated by the same process.
CIFAR10 consists of 60,000 32 by 32 RGB images equally distributed over 10 classes.
The pretrained outlier and adversarial detectors used in the example notebooks can be found here. You can use the built-in fetch_detector
function which saves the pre-trained models in a local directory filepath
and loads the detector. Alternatively, you can train a detector from scratch:
We perturb CIFAR images by adding random noise to patches (masks) of the image. For each mask size in n_mask_sizes
, sample n_masks
and apply those to each of the n_imgs
images. Then we predict outliers on the masked instances:
Define masks and get images:
Calculate instance level outlier scores:
Reconstruction of masked images and outlier scores per channel:
Visualize:
The sensitivity of the outlier detector can not only be controlled via the threshold
, but also by selecting the percentage of the features used for the instance level outlier score computation. For instance, we might want to flag outliers if 40% of the features (pixels for images) have an average outlier score above the threshold. This is possible via the outlier_perc
argument in the predict
function. It specifies the percentage of the features that are used for outlier detection, sorted in descending outlier score order.
Visualize outlier scores vs. mask sizes and percentage of features used:
Finding good threshold values can be tricky since they are typically not easy to interpret. The infer_threshold
method helps finding a sensible value. We need to pass a batch of instances X
and specify what percentage of those we consider to be normal via threshold_perc
.