alibi.utils.distance

Functions

`abdm`

abdm(X: numpy.ndarray, cat_vars: dict, cat_vars_bin: dict = {})

Calculate the pair-wise distances between categories of a categorical variable using the Association-Based Distance Metric based on Le et al (2005). http://www.jaist.ac.jp/~bao/papers/N26.pdf

Name

Type

Default

Description

X

numpy.ndarray

Batch of arrays.

cat_vars

dict

Dict with as keys the categorical columns and as optional values the number of categories per categorical variable.

cat_vars_bin

dict

{}

Dict with as keys the binned numerical columns and as optional values the number of bins per variable.

`batch_compute_kernel_matrix`

batch_compute_kernel_matrix(x: Union[list, numpy.ndarray], y: Union[list, numpy.ndarray], kernel: Callable[[.[<class 'numpy.ndarray'>, <class 'numpy.ndarray'>]], numpy.ndarray], batch_size: int = 10000000000, preprocess_fn: Optional[Callable[[.[typing.Union[list, numpy.ndarray]]], numpy.ndarray]] = None) -> numpy.ndarray

Compute the kernel matrix between x and y by filling in blocks of size batch_size x batch_size at a time.

Name

Type

Default

Description

x

Union[list, numpy.ndarray]

The first list/numpy array of data instances.

y

Union[list, numpy.ndarray]

The second list/numpy array of data instances.

kernel

Callable[[.[<class 'numpy.ndarray'>, <class 'numpy.ndarray'>]], numpy.ndarray]

Kernel function to be used for kernel matrix computation.

batch_size

int

10000000000

Batch size to be used for each prediction.

preprocess_fn

Optional[Callable[[.[typing.Union[list, numpy.ndarray]]], numpy.ndarray]]

None

Optional preprocessing function for each batch.

Returns

Type: numpy.ndarray

`cityblock_batch`

cityblock_batch(X: numpy.ndarray, y: numpy.ndarray) -> numpy.ndarray

Calculate the L1 distances between a batch of arrays X and an array of the same shape y.

Name

Type

Default

Description

X

numpy.ndarray

Batch of arrays to calculate the distances from.

y

numpy.ndarray

Array to calculate the distance to.

Returns

Type: numpy.ndarray

`multidim_scaling`

multidim_scaling(d_pair: dict, feature_range: Tuple[numpy.ndarray, numpy.ndarray], n_components: int = 2, use_metric: bool = True, standardize_cat_vars: bool = True, smooth: float = 1.0, center: bool = True, update_feature_range: bool = True) -> Tuple[dict, tuple]

Apply multidimensional scaling to pairwise distance matrices.

Name

Type

Default

Description

d_pair

dict

Dict with as keys the column index of the categorical variables and as values a pairwise distance matrix for the categories of the variable.

feature_range

Tuple[numpy.ndarray, numpy.ndarray]

Tuple with min and max ranges to allow for perturbed instances. Min and max ranges are numpy arrays with dimension (1 x nb of features).

n_components

int

2

Number of dimensions in which to immerse the dissimilarities.

use_metric

bool

True

If True, perform metric MDS; otherwise, perform nonmetric MDS.

standardize_cat_vars

bool

True

Standardize numerical values of categorical variables if True.

smooth

float

1.0

Smoothing exponent between 0 and 1 for the distances. Lower values than 1 will smooth the difference in distance metric between different features.

center

bool

True

Whether to center the scaled distance measures. If False, the min distance for each feature except for the feature with the highest raw max distance will be the lower bound of the feature range, but the upper bound will be below the max feature range.

update_feature_range

bool

True

Update feature range with scaled values.

Returns

Type: Tuple[dict, tuple]

`mvdm`

mvdm(X: numpy.ndarray, y: numpy.ndarray, cat_vars: dict, alpha: int = 1) -> Dict[int, numpy.ndarray]

Calculate the pair-wise distances between categories of a categorical variable using the Modified Value Difference Measure based on Cost et al (1993). https://link.springer.com/article/10.1023/A:1022664626993

Name

Type

Default

Description

X

numpy.ndarray

Batch of arrays.

y

numpy.ndarray

Batch of labels or predictions.

cat_vars

dict

Dict with as keys the categorical columns and as optional values the number of categories per categorical variable.

alpha

int

1

Power of absolute difference between conditional probabilities.

Returns

Type: Dict[int, numpy.ndarray]

`squared_pairwise_distance`

squared_pairwise_distance(x: numpy.ndarray, y: numpy.ndarray, a_min: float = 1e-07, a_max: float = 1e+30) -> numpy.ndarray

numpy pairwise squared Euclidean distance between samples x and y.

Name

Type

Default

Description

x

numpy.ndarray

A batch of instances of shape Nx x features.

y

numpy.ndarray

A batch of instances of shape Ny x features.

a_min

float

1e-07

Lower bound to clip distance values.

a_max

float

1e+30

Upper bound to clip distance values.

Returns

Type: numpy.ndarray

Previousalibi.utils.discretizer Nextalibi.utils.distributed

Last updated 1 month ago

Was this helpful?