alibi.explainers.anchors.text_samplers
Constants
TYPE_CHECKING
TYPE_CHECKING
TYPE_CHECKING: bool = False
bool(x) -> bool
Returns True when the argument x is true, False otherwise. The builtins True and False are the only two instances of the class bool. The class bool is a subclass of the class int, and cannot be subclassed.
logger
logger
logger: logging.Logger = <Logger alibi.explainers.anchors.text_samplers (WARNING)>
Instances of the Logger class represent a single logging channel. A "logging channel" indicates an area of an application. Exactly how an "area" is defined is up to the application developer. Since an application can have any number of areas, logging channels are identified by a unique string. Application areas can be nested (e.g. an area of "input processing" might include sub-areas "read CSV files", "read XLS files" and "read Gnumeric files"). To cater for this natural nesting, channel names are organized into a namespace hierarchy where levels are separated by periods, much like the Java or Python package namespace. So in the instance given above, channel names might be "input" for the upper level, and "input.csv", "input.xls" and "input.gnu" for the sub-levels. There is no arbitrary limit to the depth of nesting.
AnchorTextSampler
AnchorTextSampler
Constructor
AnchorTextSampler(self, /, *args, **kwargs)
Methods
set_text
set_text
set_text(text: str) -> None
text
str
Returns
Type:
None
Neighbors
Neighbors
Constructor
Neighbors(self, nlp_obj: 'spacy.language.Language', n_similar: int = 500, w_prob: float = -15.0) -> None
nlp_obj
spacy.language.Language
spaCy
model.
n_similar
int
500
Number of similar words to return.
w_prob
float
-15.0
Smoothed log probability estimate of token's type.
Methods
neighbors
neighbors
neighbors(word: str, tag: str, top_n: int) -> dict
word
str
Word for which we need to find similar words.
tag
str
Part of speech tag for the words.
top_n
int
Return only top_n
neighbors.
Returns
Type:
dict
SimilaritySampler
SimilaritySampler
Inherits from: AnchorTextSampler
Constructor
SimilaritySampler(self, nlp: 'spacy.language.Language', perturb_opts: Dict)
nlp
spacy.language.Language
spaCy
object.
perturb_opts
Dict
Perturbation options.
Methods
find_similar_words
find_similar_words
find_similar_words() -> None
This function queries a spaCy
nlp model to find n
similar words with the same part of speech for each word in the instance to be explained. For each word the search procedure returns a dictionary containing a numpy
array of words ('words'
) and a numpy
array of word similarities ('similarities'
).
Returns
Type:
None
perturb_sentence_similarity
perturb_sentence_similarity
perturb_sentence_similarity(present: tuple, n: int, sample_proba: float = 0.5, forbidden: frozenset = frozenset(), forbidden_tags: frozenset = frozenset({'PRP$'}), forbidden_words: frozenset = frozenset({'be'}), temperature: float = 1.0, pos: frozenset = frozenset({'VERB', 'ADJ', 'ADP', 'NOUN', 'ADV', 'DET'}), use_proba: bool = False, kwargs) -> Tuple[numpy.ndarray, numpy.ndarray]
present
tuple
Word index in the text for the words in the proposed anchor.
n
int
Number of samples used when sampling from the corpus.
sample_proba
float
0.5
Sample probability for a word if use_proba=False
.
forbidden
frozenset
frozenset()
Forbidden lemmas.
forbidden_tags
frozenset
frozenset({'PRP$'})
Forbidden POS tags.
forbidden_words
frozenset
frozenset({'be'})
Forbidden words.
temperature
float
1.0
Sample weight hyper-parameter if use_proba=True
.
pos
frozenset
frozenset({'VERB', 'ADJ', 'ADP', 'NOUN', 'ADV', 'DET'})
POS that can be changed during perturbation.
use_proba
bool
False
Bool whether to sample according to a similarity score with the corpus embeddings.
**kwargs
Other arguments. Not used.
Returns
Type:
Tuple[numpy.ndarray, numpy.ndarray]
set_data_type
set_data_type
set_data_type() -> None
Working with numpy
arrays of strings requires setting the data type to avoid truncating examples. This function estimates the longest sentence expected during the sampling process, which is used to set the number of characters for the samples and examples arrays. This depends on the perturbation method used for sampling.
Returns
Type:
None
set_text
set_text
set_text(text: str) -> None
text
str
Text to be processed.
Returns
Type:
None
UnknownSampler
UnknownSampler
Inherits from: AnchorTextSampler
Constructor
UnknownSampler(self, nlp: 'spacy.language.Language', perturb_opts: Dict)
nlp
spacy.language.Language
spaCy
object.
perturb_opts
Dict
Perturbation options.
Methods
set_data_type
set_data_type
set_data_type() -> None
Working with numpy
arrays of strings requires setting the data type to avoid truncating examples. This function estimates the longest sentence expected during the sampling process, which is used to set the number of characters for the samples and examples arrays. This depends on the perturbation method used for sampling.
Returns
Type:
None
set_text
set_text
set_text(text: str) -> None
text
str
Text to be processed.
Returns
Type:
None
Functions
load_spacy_lexeme_prob
load_spacy_lexeme_prob
load_spacy_lexeme_prob(nlp: spacy.language.Language) -> spacy.language.Language
This utility function loads the lexeme_prob
table for a spacy model if it is not present. This is required to enable support for different spacy versions.
nlp
spacy.language.Language
Returns
Type:
spacy.language.Language
Last updated
Was this helpful?