alibi.utils.distance
Functions
abdm
abdm
abdm(X: numpy.ndarray, cat_vars: dict, cat_vars_bin: dict = {})
Calculate the pair-wise distances between categories of a categorical variable using the Association-Based Distance Metric based on Le et al (2005). http://www.jaist.ac.jp/~bao/papers/N26.pdf
X
numpy.ndarray
Batch of arrays.
cat_vars
dict
Dict with as keys the categorical columns and as optional values the number of categories per categorical variable.
cat_vars_bin
dict
{}
Dict with as keys the binned numerical columns and as optional values the number of bins per variable.
batch_compute_kernel_matrix
batch_compute_kernel_matrix
batch_compute_kernel_matrix(x: Union[list, numpy.ndarray], y: Union[list, numpy.ndarray], kernel: Callable[[.[<class 'numpy.ndarray'>, <class 'numpy.ndarray'>]], numpy.ndarray], batch_size: int = 10000000000, preprocess_fn: Optional[Callable[[.[typing.Union[list, numpy.ndarray]]], numpy.ndarray]] = None) -> numpy.ndarray
Compute the kernel matrix between x
and y
by filling in blocks of size batch_size x batch_size
at a time.
x
Union[list, numpy.ndarray]
The first list/numpy
array of data instances.
y
Union[list, numpy.ndarray]
The second list/numpy
array of data instances.
kernel
Callable[[.[<class 'numpy.ndarray'>, <class 'numpy.ndarray'>]], numpy.ndarray]
Kernel function to be used for kernel matrix computation.
batch_size
int
10000000000
Batch size to be used for each prediction.
preprocess_fn
Optional[Callable[[.[typing.Union[list, numpy.ndarray]]], numpy.ndarray]]
None
Optional preprocessing function for each batch.
Returns
Type:
numpy.ndarray
cityblock_batch
cityblock_batch
cityblock_batch(X: numpy.ndarray, y: numpy.ndarray) -> numpy.ndarray
Calculate the L1 distances between a batch of arrays X
and an array of the same shape y
.
X
numpy.ndarray
Batch of arrays to calculate the distances from.
y
numpy.ndarray
Array to calculate the distance to.
Returns
Type:
numpy.ndarray
multidim_scaling
multidim_scaling
multidim_scaling(d_pair: dict, feature_range: Tuple[numpy.ndarray, numpy.ndarray], n_components: int = 2, use_metric: bool = True, standardize_cat_vars: bool = True, smooth: float = 1.0, center: bool = True, update_feature_range: bool = True) -> Tuple[dict, tuple]
Apply multidimensional scaling to pairwise distance matrices.
d_pair
dict
Dict with as keys the column index of the categorical variables and as values a pairwise distance matrix for the categories of the variable.
feature_range
Tuple[numpy.ndarray, numpy.ndarray]
Tuple with min
and max
ranges to allow for perturbed instances. Min
and max
ranges are numpy
arrays with dimension (1 x nb of features
).
n_components
int
2
Number of dimensions in which to immerse the dissimilarities.
use_metric
bool
True
If True
, perform metric MDS; otherwise, perform nonmetric MDS.
standardize_cat_vars
bool
True
Standardize numerical values of categorical variables if True
.
smooth
float
1.0
Smoothing exponent between 0 and 1 for the distances. Lower values than 1 will smooth the difference in distance metric between different features.
center
bool
True
Whether to center the scaled distance measures. If False
, the min distance for each feature except for the feature with the highest raw max distance will be the lower bound of the feature range, but the upper bound will be below the max feature range.
update_feature_range
bool
True
Update feature range with scaled values.
Returns
Type:
Tuple[dict, tuple]
mvdm
mvdm
mvdm(X: numpy.ndarray, y: numpy.ndarray, cat_vars: dict, alpha: int = 1) -> Dict[int, numpy.ndarray]
Calculate the pair-wise distances between categories of a categorical variable using the Modified Value Difference Measure based on Cost et al (1993). https://link.springer.com/article/10.1023/A:1022664626993
X
numpy.ndarray
Batch of arrays.
y
numpy.ndarray
Batch of labels or predictions.
cat_vars
dict
Dict with as keys the categorical columns and as optional values the number of categories per categorical variable.
alpha
int
1
Power of absolute difference between conditional probabilities.
Returns
Type:
Dict[int, numpy.ndarray]
squared_pairwise_distance
squared_pairwise_distance
squared_pairwise_distance(x: numpy.ndarray, y: numpy.ndarray, a_min: float = 1e-07, a_max: float = 1e+30) -> numpy.ndarray
numpy
pairwise squared Euclidean distance between samples x
and y
.
x
numpy.ndarray
A batch of instances of shape Nx x features
.
y
numpy.ndarray
A batch of instances of shape Ny x features
.
a_min
float
1e-07
Lower bound to clip distance values.
a_max
float
1e+30
Upper bound to clip distance values.
Returns
Type:
numpy.ndarray
Last updated
Was this helpful?