alibi_detect.datasets
Constants
logger
loggerlogger: logging.Logger = <Logger alibi_detect.datasets (WARNING)>Instances of the Logger class represent a single logging channel. A "logging channel" indicates an area of an application. Exactly how an "area" is defined is up to the application developer. Since an application can have any number of areas, logging channels are identified by a unique string. Application areas can be nested (e.g. an area of "input processing" might include sub-areas "read CSV files", "read XLS files" and "read Gnumeric files"). To cater for this natural nesting, channel names are organized into a namespace hierarchy where levels are separated by periods, much like the Java or Python package namespace. So in the instance given above, channel names might be "input" for the upper level, and "input.csv", "input.xls" and "input.gnu" for the sub-levels. There is no arbitrary limit to the depth of nesting.
TIMEOUT
TIMEOUTTIMEOUT: int = 10int([x]) -> integer int(x, base=10) -> integer
Convert a number or string to an integer, or return 0 if no arguments are given. If x is a number, return x.int(). For floating point numbers, this truncates towards zero.
If x is not a number or if base is given, then x must be a string, bytes, or bytearray instance representing an integer literal in the given base. The literal can be preceded by '+' or '-' and be surrounded by whitespace. The base defaults to 10. Valid bases are 0 and 2-36. Base 0 means to interpret the base from the string as an integer literal.
int('0b100', base=0) 4
Functions
corruption_types_cifar10c
corruption_types_cifar10ccorruption_types_cifar10c() -> List[str]Retrieve list with corruption types used in CIFAR-10-C.
Returns
Type:
List[str]
fetch_attack
fetch_attackfetch_attack(dataset: str, model: str, attack: str, return_X_y: bool = False) -> Union[alibi_detect.utils.data.Bunch, Tuple[Tuple[numpy.ndarray, numpy.ndarray], Tuple[numpy.ndarray, numpy.ndarray]]]Load adversarial instances for a given dataset, model and attack type.
dataset
str
Dataset under attack.
model
str
Model under attack.
attack
str
Attack name.
return_X_y
bool
False
Bool, whether to only return the data and target values or a Bunch object.
Returns
Type:
Union[alibi_detect.utils.data.Bunch, Tuple[Tuple[numpy.ndarray, numpy.ndarray], Tuple[numpy.ndarray, numpy.ndarray]]]
fetch_cifar10c
fetch_cifar10cfetch_cifar10c(corruption: Union[str, List[str]], severity: int, return_X_y: bool = False) -> Union[alibi_detect.utils.data.Bunch, Tuple[numpy.ndarray, numpy.ndarray]]Fetch CIFAR-10-C data. Originally obtained from https://zenodo.org/record/2535967#.XkKh2XX7Qts and
introduced in "Hendrycks, D and Dietterich, T.G. Benchmarking Neural Network Robustness to Common Corruptions and Perturbations. In 7th International Conference on Learning Represenations, 2019.".
corruption
Union[str, List[str]]
Corruption type. Options can be checked with get_corruption_cifar10c(). Alternatively, specify 'all' for all corruptions at a severity level.
severity
int
Severity level of corruption (1-5).
return_X_y
bool
False
Bool, whether to only return the data and target values or a Bunch object.
Returns
Type:
Union[alibi_detect.utils.data.Bunch, Tuple[numpy.ndarray, numpy.ndarray]]
fetch_ecg
fetch_ecgfetch_ecg(return_X_y: bool = False) -> Union[alibi_detect.utils.data.Bunch, Tuple[Tuple[numpy.ndarray, numpy.ndarray], Tuple[numpy.ndarray, numpy.ndarray]]]Fetch ECG5000 data. The dataset contains 5000 ECG's, originally obtained from
Physionet (https://archive.physionet.org/cgi-bin/atm/ATM) under the name "BIDMC Congestive Heart Failure Database(chfdb)", record "chf07".
return_X_y
bool
False
Bool, whether to only return the data and target values or a Bunch object.
Returns
Type:
Union[alibi_detect.utils.data.Bunch, Tuple[Tuple[numpy.ndarray, numpy.ndarray], Tuple[numpy.ndarray, numpy.ndarray]]]
fetch_genome
fetch_genomefetch_genome(return_X_y: bool = False, return_labels: bool = False) -> Union[alibi_detect.utils.data.Bunch, tuple]Load genome data including their labels and whether they are outliers or not. More details about the data can be
found in the readme on https://console.cloud.google.com/storage/browser/seldon-datasets/genome/. The original data can be found here: https://drive.google.com/drive/folders/1Ht9xmzyYPbDouUTl_KQdLTJQYX2CuclR.
return_X_y
bool
False
Bool, whether to only return the data and target values or a Bunch object.
return_labels
bool
False
Whether to return the genome labels which are detailed in the label_json dict of the returned Bunch object.
Returns
Type:
Union[alibi_detect.utils.data.Bunch, tuple]
fetch_kdd
fetch_kddfetch_kdd(target: list = ['dos', 'r2l', 'u2r', 'probe'], keep_cols: list = ['srv_count', 'serror_rate', 'srv_serror_rate', 'rerror_rate', 'srv_rerror_rate', 'same_srv_rate', 'diff_srv_rate', 'srv_diff_host_rate', 'dst_host_count', 'dst_host_srv_count', 'dst_host_same_srv_rate', 'dst_host_diff_srv_rate', 'dst_host_same_src_port_rate', 'dst_host_srv_diff_host_rate', 'dst_host_serror_rate', 'dst_host_srv_serror_rate', 'dst_host_rerror_rate', 'dst_host_srv_rerror_rate'], percent10: bool = True, return_X_y: bool = False) -> Union[alibi_detect.utils.data.Bunch, Tuple[numpy.ndarray, numpy.ndarray]]KDD Cup '99 dataset. Detect computer network intrusions.
target
list
['dos', 'r2l', 'u2r', 'probe']
List with attack types to detect.
keep_cols
list
['srv_count', 'serror_rate', 'srv_serror_rate', 'rerror_rate', 'srv_rerror_rate', 'same_srv_rate', 'diff_srv_rate', 'srv_diff_host_rate', 'dst_host_count', 'dst_host_srv_count', 'dst_host_same_srv_rate', 'dst_host_diff_srv_rate', 'dst_host_same_src_port_rate', 'dst_host_srv_diff_host_rate', 'dst_host_serror_rate', 'dst_host_srv_serror_rate', 'dst_host_rerror_rate', 'dst_host_srv_rerror_rate']
List with columns to keep. Defaults to continuous features.
percent10
bool
True
Bool, whether to only return 10% of the data.
return_X_y
bool
False
Bool, whether to only return the data and target values or a Bunch object.
Returns
Type:
Union[alibi_detect.utils.data.Bunch, Tuple[numpy.ndarray, numpy.ndarray]]
fetch_nab
fetch_nabfetch_nab(ts: str, return_X_y: bool = False) -> Union[alibi_detect.utils.data.Bunch, Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]]Get time series in a DataFrame from the Numenta Anomaly Benchmark: https://github.com/numenta/NAB.
ts
str
return_X_y
bool
False
Bool, whether to only return the data and target values or a Bunch object.
Returns
Type:
Union[alibi_detect.utils.data.Bunch, Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]]
get_list_nab
get_list_nabget_list_nab() -> listGet list of possible time series to retrieve from the Numenta Anomaly Benchmark: https://github.com/numenta/NAB.
Returns
Type:
list
google_bucket_list
google_bucket_listgoogle_bucket_list(url: str, folder: str, filetype: Optional[str] = None, full_path: bool = False) -> List[str]Retrieve list with items in google bucket folder.
url
str
Bucket directory.
folder
str
Folder to retrieve list of items from.
filetype
Optional[str]
None
File extension, e.g. npy for saved numpy arrays.
full_path
bool
False
Returns
Type:
List[str]
load_genome_npz
load_genome_npzload_genome_npz(fold: str, return_labels: bool = False) -> Union[Tuple[numpy.ndarray, numpy.ndarray], Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray]]fold
str
return_labels
bool
False
Returns
Type:
Union[Tuple[numpy.ndarray, numpy.ndarray], Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray]]
load_url_arff
load_url_arffload_url_arff(url: str, dtype: type[numpy.generic] = <class 'numpy.float32'>) -> numpy.ndarrayLoad arff files from url.
url
str
Address of arff file.
dtype
type[numpy.generic]
<class 'numpy.float32'>
Returns
Type:
numpy.ndarray
Last updated
Was this helpful?

