alibi.explainers.anchors.anchor_text
Constants
TYPE_CHECKING
TYPE_CHECKING
TYPE_CHECKING: bool = False
bool(x) -> bool
Returns True when the argument x is true, False otherwise. The builtins True and False are the only two instances of the class bool. The class bool is a subclass of the class int, and cannot be subclassed.
DEFAULT_DATA_ANCHOR
DEFAULT_DATA_ANCHOR
DEFAULT_DATA_ANCHOR: dict = {'anchor': [], 'precision': None, 'coverage': None, 'raw': None}
DEFAULT_META_ANCHOR
DEFAULT_META_ANCHOR
DEFAULT_META_ANCHOR: dict = {'name': None, 'type': ['blackbox'], 'explanations': ['local'], 'params': {},...
logger
logger
logger: logging.Logger = <Logger alibi.explainers.anchors.anchor_text (WARNING)>
Instances of the Logger class represent a single logging channel. A "logging channel" indicates an area of an application. Exactly how an "area" is defined is up to the application developer. Since an application can have any number of areas, logging channels are identified by a unique string. Application areas can be nested (e.g. an area of "input processing" might include sub-areas "read CSV files", "read XLS files" and "read Gnumeric files"). To cater for this natural nesting, channel names are organized into a namespace hierarchy where levels are separated by periods, much like the Java or Python package namespace. So in the instance given above, channel names might be "input" for the upper level, and "input.csv", "input.xls" and "input.gnu" for the sub-levels. There is no arbitrary limit to the depth of nesting.
DEFAULT_SAMPLING_UNKNOWN
DEFAULT_SAMPLING_UNKNOWN
DEFAULT_SAMPLING_UNKNOWN: dict = {'sample_proba': 0.5}
DEFAULT_SAMPLING_SIMILARITY
DEFAULT_SAMPLING_SIMILARITY
DEFAULT_SAMPLING_SIMILARITY: dict = {'sample_proba': 0.5, 'top_n': 100, 'temperature': 1.0, 'use_proba': False}
DEFAULT_SAMPLING_LANGUAGE_MODEL
DEFAULT_SAMPLING_LANGUAGE_MODEL
DEFAULT_SAMPLING_LANGUAGE_MODEL: dict = {'filling': 'parallel', 'sample_proba': 0.5, 'top_n': 100, 'temperature': 1.0...
AnchorText
AnchorText
Inherits from: Explainer
, ABC
, Base
Constructor
AnchorText(self, predictor: Callable[[List[str]], numpy.ndarray], sampling_strategy: str = 'unknown', nlp: Optional[ForwardRef('spacy.language.Language')] = None, language_model: Optional[ForwardRef('LanguageModel')] = None, seed: int = 0, **kwargs: Any) -> None
predictor
Callable[[.[typing.List[str]]], numpy.ndarray]
A callable that takes a list of text strings representing N
data points as inputs and returns N
outputs.
sampling_strategy
str
'unknown'
Perturbation distribution method: - 'unknown'
- replaces words with UNKs. - 'similarity'
- samples according to a similarity score with the corpus embeddings. - 'language_model'
- samples according the language model's output distributions.
nlp
Optional[spacy.language.Language]
None
spaCy
object when sampling method is 'unknown'
or 'similarity'
.
language_model
Optional[alibi.utils.lang_model.LanguageModel]
None
Transformers masked language model. This is a model that it adheres to the LanguageModel
interface we define in :py:class:alibi.utils.lang_model.LanguageModel
.
seed
int
0
If set, ensure identical random streams.
kwargs
typing.Any
Sampling arguments can be passed as kwargs
depending on the sampling_strategy
. Check default arguments defined in: - :py:data:alibi.explainers.anchor_text.DEFAULT_SAMPLING_UNKNOWN
- :py:data:alibi.explainers.anchor_text.DEFAULT_SAMPLING_SIMILARITY
- :py:data:alibi.explainers.anchor_text.DEFAULT_SAMPLING_LANGUAGE_MODEL
Methods
compare_labels
compare_labels
compare_labels(samples: numpy.ndarray) -> numpy.ndarray
samples
numpy.ndarray
Samples whose labels are to be compared with the instance label.
Returns
Type:
numpy.ndarray
explain
explain
explain(text: str, threshold: float = 0.95, delta: float = 0.1, tau: float = 0.15, batch_size: int = 100, coverage_samples: int = 10000, beam_size: int = 1, stop_on_first: bool = True, max_anchor_size: Optional[int] = None, min_samples_start: int = 100, n_covered_ex: int = 10, binary_cache_size: int = 10000, cache_margin: int = 1000, verbose: bool = False, verbose_every: int = 1, kwargs: typing.Any) -> alibi.api.interfaces.Explanation
text
str
Text instance to be explained.
threshold
float
0.95
Minimum anchor precision threshold. The algorithm tries to find an anchor that maximizes the coverage under precision constraint. The precision constraint is formally defined as :math:P(prec(A) \ge t) \ge 1 - \delta
, where :math:A
is an anchor, :math:t
is the threshold
parameter, :math:\delta
is the delta
parameter, and :math:prec(\cdot)
denotes the precision of an anchor. In other words, we are seeking for an anchor having its precision greater or equal than the given threshold
with a confidence of (1 - delta)
. A higher value guarantees that the anchors are faithful to the model, but also leads to more computation time. Note that there are cases in which the precision constraint cannot be satisfied due to the quantile-based discretisation of the numerical features. If that is the case, the best (i.e. highest coverage) non-eligible anchor is returned.
delta
float
0.1
Significance threshold. 1 - delta
represents the confidence threshold for the anchor precision (see threshold
) and the selection of the best anchor candidate in each iteration (see tau
).
tau
float
0.15
Multi-armed bandit parameter used to select candidate anchors in each iteration. The multi-armed bandit algorithm tries to find within a tolerance tau
the most promising (i.e. according to the precision) beam_size
candidate anchor(s) from a list of proposed anchors. Formally, when the beam_size=1
, the multi-armed bandit algorithm seeks to find an anchor :math:A
such that :math:P(prec(A) \ge prec(A^\star) - \tau) \ge 1 - \delta
, where :math:A^\star
is the anchor with the highest true precision (which we don't know), :math:\tau
is the tau
parameter, :math:\delta
is the delta
parameter, and :math:prec(\cdot)
denotes the precision of an anchor. In other words, in each iteration, the algorithm returns with a probability of at least 1 - delta
an anchor :math:A
with a precision within an error tolerance of tau
from the precision of the highest true precision anchor :math:A^\star
. A bigger value for tau
means faster convergence but also looser anchor conditions.
batch_size
int
100
Batch size used for sampling. The Anchor algorithm will query the black-box model in batches of size batch_size
. A larger batch_size
gives more confidence in the anchor, again at the expense of computation time since it involves more model prediction calls.
coverage_samples
int
10000
Number of samples used to estimate coverage from during anchor search.
beam_size
int
1
Number of candidate anchors selected by the multi-armed bandit algorithm in each iteration from a list of proposed anchors. A bigger beam width can lead to a better overall anchor (i.e. prevents the algorithm of getting stuck in a local maximum) at the expense of more computation time.
stop_on_first
bool
True
If True
, the beam search algorithm will return the first anchor that has satisfies the probability constraint.
max_anchor_size
Optional[int]
None
Maximum number of features to include in an anchor.
min_samples_start
int
100
Number of samples used for anchor search initialisation.
n_covered_ex
int
10
How many examples where anchors apply to store for each anchor sampled during search (both examples where prediction on samples agrees/disagrees with predicted label are stored).
binary_cache_size
int
10000
The anchor search pre-allocates binary_cache_size
batches for storing the boolean arrays returned during sampling.
cache_margin
int
1000
When only max(cache_margin, batch_size)
positions in the binary cache remain empty, a new cache of the same size is pre-allocated to continue buffering samples.
verbose
bool
False
Display updates during the anchor search iterations.
verbose_every
int
1
Frequency of displayed iterations during anchor search process.
**kwargs
Other keyword arguments passed to the anchor beam search and the text sampling and perturbation functions.
Returns
Type:
alibi.api.interfaces.Explanation
reset_predictor
reset_predictor
reset_predictor(predictor: Callable) -> None
predictor
Callable
New predictor function.
Returns
Type:
None
sampler
sampler
sampler(anchor: Tuple[int, tuple], num_samples: int, compute_labels: bool = True) -> Union[List[Union[numpy.ndarray, float, int]], List[numpy.ndarray]]
anchor
Tuple[int, tuple]
- int
- the position of the anchor in the input batch. - tuple
- the anchor itself, a list of words to be kept unchanged.
num_samples
int
Number of generated perturbed samples.
compute_labels
bool
True
If True
, an array of comparisons between predictions on perturbed samples and instance to be explained is returned.
Returns
Type:
Union[List[Union[numpy.ndarray, float, int]], List[numpy.ndarray]]
Last updated
Was this helpful?