Datasets
Last updated
Was this helpful?
Last updated
Was this helpful?
The package also contains functionality in alibi_detect.datasets
to easily fetch a number of datasets for different modalities. For each dataset either the data and labels or a Bunch object with the data, labels and optional metadata are returned. Example:
Genome Dataset: fetch_genome
Bacteria genomics dataset for out-of-distribution detection, released as part of . From the original TL;DR: The dataset contains genomic sequences of 250 base pairs from 10 in-distribution bacteria classes for training, 60 OOD bacteria classes for validation, and another 60 different OOD bacteria classes for test. There are respectively 1, 7 and again 7 million sequences in the training, validation and test sets. For detailed info on the dataset check the .
ECG 5000: fetch_ecg
5000 ECG's, originally obtained from .
NAB: fetch_nab
Any univariate time series in a DataFrame from the . A list with the available time series can be retrieved using alibi_detect.datasets.get_list_nab()
.
CIFAR-10-C: fetch_cifar10c
Adversarial CIFAR-10: fetch_attack
KDD Cup '99: fetch_kdd
CIFAR-10-C () contains the test set of CIFAR-10, but corrupted and perturbed by various types of noise, blur, brightness etc. at different levels of severity, leading to a gradual decline in a classification model's performance trained on CIFAR-10. fetch_cifar10c
allows you to pick any severity level or corruption type. The list with available corruption types can be retrieved with alibi_detect.datasets.corruption_types_cifar10c()
. The dataset can be used in research on robustness and drift. The original data can be found . Example:
Load adversarial instances on a ResNet-56 classifier trained on CIFAR-10. Available attacks: ('cw') and ('slide'). Example:
Dataset with different types of computer network intrusions. fetch_kdd
allows you to select a subset of network intrusions as targets or pick only specified features. The original data can be found .