alibi.explainers.anchors.anchor_tabular
Constants
DEFAULT_DATA_ANCHOR
DEFAULT_DATA_ANCHOR
DEFAULT_DATA_ANCHOR: dict = {'anchor': [], 'precision': None, 'coverage': None, 'raw': None}
DEFAULT_META_ANCHOR
DEFAULT_META_ANCHOR
DEFAULT_META_ANCHOR: dict = {'name': None, 'type': ['blackbox'], 'explanations': ['local'], 'params': {},...
AnchorTabular
AnchorTabular
Inherits from: Explainer
, FitMixin
, ABC
, Base
Constructor
AnchorTabular(self, predictor: Callable[[numpy.ndarray], numpy.ndarray], feature_names: List[str], categorical_names: Optional[Dict[int, List[str]]] = None, dtype: Type[numpy.generic] = <class 'numpy.float32'>, ohe: bool = False, seed: Optional[int] = None) -> None
predictor
Callable[[.[<class 'numpy.ndarray'>]], numpy.ndarray]
A callable that takes a numpy
array of N
data points as inputs and returns N
outputs.
feature_names
List[str]
List with feature names.
categorical_names
Optional[Dict[int, List[str]]]
None
Dictionary where keys are feature columns and values are the categories for the feature.
dtype
type[numpy.generic]
<class 'numpy.float32'>
A numpy
scalar type that corresponds to the type of input array expected by predictor
. This may be used to construct arrays of the given type to be passed through the predictor
. For most use cases this argument should have no effect, but it is exposed for use with predictors that would break when called with an array of unsupported type.
ohe
bool
False
Whether the categorical variables are one-hot encoded (OHE) or not. If not OHE, they are assumed to have ordinal encodings.
seed
Optional[int]
None
Used to set the random number generator for repeatability purposes.
Properties
predictor
Optional[Callable]
Methods
add_names_to_exp
add_names_to_exp
add_names_to_exp(explanation: dict) -> None
explanation
dict
Dict with anchors and additional metadata.
Returns
Type:
None
explain
explain
explain(X: numpy.ndarray, threshold: float = 0.95, delta: float = 0.1, tau: float = 0.15, batch_size: int = 100, coverage_samples: int = 10000, beam_size: int = 1, stop_on_first: bool = False, max_anchor_size: Optional[int] = None, min_samples_start: int = 100, n_covered_ex: int = 10, binary_cache_size: int = 10000, cache_margin: int = 1000, verbose: bool = False, verbose_every: int = 1, kwargs: typing.Any) -> alibi.api.interfaces.Explanation
X
numpy.ndarray
Instance to be explained.
threshold
float
0.95
Minimum anchor precision threshold. The algorithm tries to find an anchor that maximizes the coverage under precision constraint. The precision constraint is formally defined as :math:P(prec(A) \ge t) \ge 1 - \delta
, where :math:A
is an anchor, :math:t
is the threshold
parameter, :math:\delta
is the delta
parameter, and :math:prec(\cdot)
denotes the precision of an anchor. In other words, we are seeking for an anchor having its precision greater or equal than the given threshold
with a confidence of (1 - delta)
. A higher value guarantees that the anchors are faithful to the model, but also leads to more computation time. Note that there are cases in which the precision constraint cannot be satisfied due to the quantile-based discretisation of the numerical features. If that is the case, the best (i.e. highest coverage) non-eligible anchor is returned.
delta
float
0.1
Significance threshold. 1 - delta
represents the confidence threshold for the anchor precision (see threshold
) and the selection of the best anchor candidate in each iteration (see tau
).
tau
float
0.15
Multi-armed bandit parameter used to select candidate anchors in each iteration. The multi-armed bandit algorithm tries to find within a tolerance tau
the most promising (i.e. according to the precision) beam_size
candidate anchor(s) from a list of proposed anchors. Formally, when the beam_size=1
, the multi-armed bandit algorithm seeks to find an anchor :math:A
such that :math:P(prec(A) \ge prec(A^\star) - \tau) \ge 1 - \delta
, where :math:A^\star
is the anchor with the highest true precision (which we don't know), :math:\tau
is the tau
parameter, :math:\delta
is the delta
parameter, and :math:prec(\cdot)
denotes the precision of an anchor. In other words, in each iteration, the algorithm returns with a probability of at least 1 - delta
an anchor :math:A
with a precision within an error tolerance of tau
from the precision of the highest true precision anchor :math:A^\star
. A bigger value for tau
means faster convergence but also looser anchor conditions.
batch_size
int
100
Batch size used for sampling. The Anchor algorithm will query the black-box model in batches of size batch_size
. A larger batch_size
gives more confidence in the anchor, again at the expense of computation time since it involves more model prediction calls.
coverage_samples
int
10000
Number of samples used to estimate coverage from during result search.
beam_size
int
1
Number of candidate anchors selected by the multi-armed bandit algorithm in each iteration from a list of proposed anchors. A bigger beam width can lead to a better overall anchor (i.e. prevents the algorithm of getting stuck in a local maximum) at the expense of more computation time.
stop_on_first
bool
False
If True
, the beam search algorithm will return the first anchor that has satisfies the probability constraint.
max_anchor_size
Optional[int]
None
Maximum number of features in result.
min_samples_start
int
100
Min number of initial samples.
n_covered_ex
int
10
How many examples where anchors apply to store for each anchor sampled during search (both examples where prediction on samples agrees/disagrees with desired_label
are stored).
binary_cache_size
int
10000
The result search pre-allocates binary_cache_size
batches for storing the binary arrays returned during sampling.
cache_margin
int
1000
When only max(cache_margin, batch_size)
positions in the binary cache remain empty, a new cache of the same size is pre-allocated to continue buffering samples.
verbose
bool
False
Display updates during the anchor search iterations.
verbose_every
int
1
Frequency of displayed iterations during anchor search process.
Returns
Type:
alibi.api.interfaces.Explanation
fit
fit
fit(train_data: numpy.ndarray, disc_perc: Tuple[Union[int, float], .Ellipsis] = (25, 50, 75), kwargs) -> alibi.explainers.anchors.anchor_tabular.AnchorTabular
train_data
numpy.ndarray
Representative sample from the training data.
disc_perc
Tuple[Union[int, float], .Ellipsis]
(25, 50, 75)
List with percentiles (int
) used for discretization.
Returns
Type:
alibi.explainers.anchors.anchor_tabular.AnchorTabular
reset_predictor
reset_predictor
reset_predictor(predictor: Callable) -> None
predictor
Callable
New predictor function.
Returns
Type:
None
TabularSampler
TabularSampler
A sampler that uses an underlying training set to draw records that have a subset of features with values specified in an instance to be explained, X
.
Constructor
TabularSampler(self, predictor: Callable, disc_perc: Tuple[Union[int, float], ...], numerical_features: List[int], categorical_features: List[int], feature_names: list, feature_values: dict, n_covered_ex: int = 10, seed: Optional[int] = None) -> None
predictor
Callable
A callable that takes a tensor of N
data points as inputs and returns N
outputs.
disc_perc
Tuple[Union[int, float], .Ellipsis]
Percentiles used for numerical feature discretisation.
numerical_features
List[int]
Numerical features column IDs.
categorical_features
List[int]
Categorical features column IDs.
feature_names
list
Feature names.
feature_values
dict
Key: categorical feature column ID, value: values for the feature.
n_covered_ex
int
10
For each result, a number of samples where the prediction agrees/disagrees with the prediction on instance to be explained are stored.
seed
Optional[int]
None
If set, fixes the random number sequence.
Methods
build_lookups
build_lookups
build_lookups(X: numpy.ndarray) -> List[Dict]
X
numpy.ndarray
Instance to be explained.
Returns
Type:
List[Dict]
compare_labels
compare_labels
compare_labels(samples: numpy.ndarray) -> numpy.ndarray
samples
numpy.ndarray
Samples whose labels are to be compared with the instance label.
Returns
Type:
numpy.ndarray
deferred_init
deferred_init
deferred_init(train_data: Union[numpy.ndarray, typing.Any], d_train_data: Union[numpy.ndarray, typing.Any]) -> typing.Any
train_data
Union[numpy.ndarray, typing.Any]
Data from which samples are drawn. Can be a numpy
array or a ray
future.
d_train_data
Union[numpy.ndarray, typing.Any]
Discretized version for training data. Can be a numpy
array or a ray
future.
Returns
Type:
typing.Any
get_features_index
get_features_index
get_features_index(anchor: tuple) -> Tuple[Dict[int, set[int]], Dict[int, typing.Any], List[Tuple[int, str, Union[typing.Any, int]]]]
anchor
tuple
The anchor for which the training set row indices are to be retrieved. The ints represent encoded feature ids.
Returns
Type:
Tuple[Dict[int, set[int]], Dict[int, typing.Any], List[Tuple[int, str, Union[typing.Any, int]]]]
handle_unk_features
handle_unk_features
handle_unk_features(allowed_bins: Dict[int, set[int]], num_samples: int, samples: numpy.ndarray, unk_feature_values: List[Tuple[int, str, Union[typing.Any, int]]]) -> None
allowed_bins
Dict[int, set[int]]
See :py:meth:alibi.explainers.anchors.anchor_tabular.TabularSampler.get_features_index
method.
num_samples
int
Number of replacement values.
samples
numpy.ndarray
Contains the samples whose values are to be replaced.
unk_feature_values
List[Tuple[int, str, Union[typing.Any, int]]]
List of tuples where: [0] is original feature id, [1] feature type, [2] if var is categorical, replacement value, otherwise None
Returns
Type:
None
perturbation
perturbation
perturbation(anchor: tuple, num_samples: int) -> Tuple[numpy.ndarray, numpy.ndarray, float]
anchor
tuple
Each int is an encoded feature id.
num_samples
int
Number of samples.
Returns
Type:
Tuple[numpy.ndarray, numpy.ndarray, float]
replace_features
replace_features
replace_features(samples: numpy.ndarray, allowed_rows: Dict[int, typing.Any], uniq_feat_ids: List[int], partial_anchor_rows: List[numpy.ndarray], nb_partial_anchors: numpy.ndarray, num_samples: int) -> None
samples
numpy.ndarray
Randomly drawn samples, where the anchor does not apply.
allowed_rows
Dict[int, typing.Any]
Maps feature ids to the rows indices in training set where the feature has same value as instance (cat.) or is in the same bin.
uniq_feat_ids
List[int]
Multiple encoded features in the anchor can map to the same original feature id. Unique features in the anchor. This is the list of unique original features id in the anchor.
partial_anchor_rows
List[numpy.ndarray]
The rows in the training set where each partial anchor applies. Last entry is an array of row indices where the entire anchor applies.
nb_partial_anchors
numpy.ndarray
The number of training records which contain each partial anchor.
num_samples
int
Number of perturbed samples to be returned.
Returns
Type:
None
set_instance_label
set_instance_label
set_instance_label(X: numpy.ndarray) -> None
X
numpy.ndarray
Instance to be explained.
Returns
Type:
None
set_n_covered
set_n_covered
set_n_covered(n_covered: int) -> None
n_covered
int
Number of examples to be saved.
Returns
Type:
None
Last updated
Was this helpful?