Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Alibi Detect aims to be the go-to library for outlier, adversarial and drift detection in Python using both the TensorFlow and PyTorch backends.
This means that the algorithms in the library need to handle:
Online detection with often stateful detectors.
Offline detection, where the detector is trained on a batch of unsupervised or semi-supervised data. This assumption resembles a lot of real-world settings where labels are hard to come by.
The algorithms will cover the following data types:
Tabular, including both numerical and categorical data.
Images
Time series, both univariate and multivariate.
Text
Graphs
It will also be possible to combine different algorithms in ensemble detectors.
The library currently covers both online and offline outlier detection algorithms for tabular data, images and time series as well as offline adversarial detectors for tabular data and images. Current drift detection capabilities cover almost any data modality such as mixed type tabular data, text, images or graphs, both in the online and offline setting. Furthermore, Alibi Detect provides supervised drift and context-aware drift detectors.
The near term focus will be on extending save/load functionality for PyTorch detectors, and adding outlier detectors for text and mixed data types.
Alibi Detect is an open source Python library focused on outlier, adversarial and drift detection. The package aims to cover both online and offline detectors for tabular data, text, images and time series. Both TensorFlow and PyTorch backends are supported for drift detection.
For more background on the importance of monitoring outliers and distributions in a production setting, check out this talk from the Challenges in Deploying and Monitoring Machine Learning Systems ICML 2020 workshop, based on the paper Monitoring and explainability of models in production and referencing Alibi Detect.
For a thorough introduction to drift detection, check out the talk below titled, Protecting Your Machine Learning Against Drift: An Introduction. The talk covers what drift is and why it pays to detect it, the different types of drift, how it can be detected in a principled manner and also describes the anatomy of a drift detector.
For advanced use cases, Alibi Detect features powerful configuration file based functionality. As shown below, Drift detectors can be specified with a configuration file named config.toml
(adversarial and outlier detectors coming soon!), which can then be passed to {func}~alibi_detect.saving.load_detector
:
Compared to standard instantiation, config-driven instantiation has a number of advantages:
Human readable: The config.toml
files are human-readable (and editable!), providing a readily accessible record of previously created detectors.
Flexible artefact specification: Artefacts such as datasets and models can be specified as locally serialized objects, or as runtime registered objects (see Specifying complex fields). Multiple detectors can share the same artefacts, and they can be easily swapped.
Inbuilt validation: The {func}~alibi_detect.saving.load_detector
function uses pydantic to validate detector configurations.
To get a general idea of the expected layout of a config file, see the Example config files. Alternatively, to obtain a fully populated config file for reference, users can run one of the example notebooks and generate a config file by passing an instantiated detector to {func}~alibi_detect.saving.save_detector
.
All detector configuration files follow a consistent layout, simplifying the process of writing simple config files by hand. For example, a {class}~alibi_detect.cd.KSDrift
detector with a dill serialized function to preprocess reference and test data can be specified as:
The name
field should always be the name of the detector, for example KSDrift
or SpotTheDiffDrift
. The remaining fields are the args/kwargs to pass to the detector (see the {mod}alibi_detect.cd
docs for a full list of permissible args/kwargs for each detector). All config fields follow this convention, however as discussed in Specifying artefacts, some fields can be more complex than others.
(complex_fields)=
When specifying a detector via a config.toml
file, the locally stored reference data x_ref
must be specified. In addition, many detectors also require (or allow) additional artefacts, such as kernels, functions and models. Depending on their type, artefacts can be specified in config.toml
in a number of ways:
Function/object registry: As discussed in Registering artefacts, functions and other objects defined at runtime can be registered using {func}alibi_detect.saving.registry
, allowing them to be specified in the config file without having to serialise them. For convenience a number of Alibi Detect functions such as {func}~alibi_detect.cd.tensorflow.preprocess.preprocess_drift
are also pre-registered.
Dictionaries: More complex artefacts are specified via nested dictionaries, usually containing a src
field and additional option/setting fields. Sometimes these fields may be nested artefact dictionaries themselves. See Artefact dictionaries for further details.
The following table shows the allowable formats for all possible config file artefacts.
(dictionaries)=
Simple artefacts, for example a simple preprocessing function serialized in a dill file, can be specified directly: preprocess_fn = "function.dill"
. However, if more complex, they can be specified as an artefact dictionary:
config.toml (excerpt)
Here, the preprocess_fn
field is a {class}~alibi_detect.saving.schemas.PreprocessConfig
artefact dictionary. In this example, specifying the preprocess_fn
function as a dictionary allows us to specify additional kwarg
's to be passed to the function upon loading. This example also demonstrates the flexibility of the TOML format, with dictionaries able to be specified with {} brackets or by sections demarcated with [] brackets (see the TOML documentation for more details on the TOML format).
Other config fields in the {ref}all-artefacts-table
table can be specified via artefact dictionaries in a similar way. For example, the model
and proj
fields can be set as TensorFlow or PyTorch models via the {class}~alibi_detect.saving.schemas.ModelConfig
dictionary. Often an artefact dictionary may itself contain nested artefact dictionaries, as is the case in in the following example, where a preprocess_fn
is specified with a TensorFlow model
.
config.toml (excerpt)
Each artefact dictionary has an associated pydantic model which is used for validation of config files. The documentation for these pydantic models provides a description of the permissible fields for each artefact dictionary. For examples of how the artefact dictionaries can be used in practice, see {ref}examples
.
(registering_artefacts)=
Custom artefacts defined in Python code may be specified in the config file without the need to serialise them, by first adding them to the Alibi Detect artefact registry using the {mod}alibi_detect.saving.registry
submodule. This submodule harnesses the catalogue library to allow functions to be registered with a decorator syntax:
Once the custom function has been registered, it can be specified in config.toml
files via its reference string (with @
prepended), for example "@my_function.v1"
in this case. Other objects, such as custom tensorflow or pytorch models, can also be registered by using the register
function directly. For example, to register a tensorflow encoder model:
A registered object's metadata can be obtained with registry.find()
, and all currently registered objects can be listed with registry.get_all()
. For example, registry.find("my_function.v1")
returns the following:
For convenience, Alibi Detect also pre-registers a number of commonly used utility functions and objects.
{func}~alibi_detect.cd.tensorflow.preprocess.preprocess_drift
'@cd.[backend].preprocess.preprocess_drift'
✔
✔
{class}~alibi_detect.utils.tensorflow.kernels.GaussianRBF
'@utils.[backend].kernels.GaussianRBF'
✔
✔
{class}~alibi_detect.utils.tensorflow.data.TFDataset
'@utils.tensorflow.data.TFDataset'
✔
*For backend-specific functions/classes, [backend] should be replaced the desired backend e.g. tensorflow
or pytorch
.
These can be used in config.toml
files. Of particular importance are the preprocess_drift
utility functions, which allows models, tokenizers and embeddings to be easily specified for preprocessing, as demonstrated in the IMDB example.
(examples)=
% To demonstrate the config-driven functionality, example detector configurations are presented in this section.
% To download a config file and its related artefacts, click on the Run Me tabs, copy the Python code, and run it % in your local Python shell.
(imdb_example)=