alibi.prototypes.protoselect
Constants
DEFAULT_DATA_PROTOSELECT
DEFAULT_DATA_PROTOSELECT
DEFAULT_DATA_PROTOSELECT: dict = {'prototypes': None, 'prototype_indices': None, 'prototype_labels': None}
DEFAULT_META_PROTOSELECT
DEFAULT_META_PROTOSELECT
DEFAULT_META_PROTOSELECT: dict = {'name': None, 'type': ['data'], 'explanation': ['global'], 'params': {}, 've...
logger
logger
logger: logging.Logger = <Logger alibi.prototypes.protoselect (WARNING)>
Instances of the Logger class represent a single logging channel. A "logging channel" indicates an area of an application. Exactly how an "area" is defined is up to the application developer. Since an application can have any number of areas, logging channels are identified by a unique string. Application areas can be nested (e.g. an area of "input processing" might include sub-areas "read CSV files", "read XLS files" and "read Gnumeric files"). To cater for this natural nesting, channel names are organized into a namespace hierarchy where levels are separated by periods, much like the Java or Python package namespace. So in the instance given above, channel names might be "input" for the upper level, and "input.csv", "input.xls" and "input.gnu" for the sub-levels. There is no arbitrary limit to the depth of nesting.
ProtoSelect
ProtoSelect
Inherits from: Summariser
, FitMixin
, ABC
, Base
Constructor
ProtoSelect(self, kernel_distance: Callable[[numpy.ndarray, numpy.ndarray], numpy.ndarray], eps: float, lambda_penalty: Optional[float] = None, batch_size: int = 10000000000, preprocess_fn: Optional[Callable[[Union[list, numpy.ndarray]], numpy.ndarray]] = None, verbose: bool = False)
kernel_distance
Callable[[.[<class 'numpy.ndarray'>, <class 'numpy.ndarray'>]], numpy.ndarray]
Kernel distance to be used. Expected to support computation in batches. Given an input x
of size Nx x f1 x f2 x ...
and an input y
of size Ny x f1 x f2 x ...
, the kernel distance should return a kernel matrix of size Nx x Ny
.
eps
float
Epsilon ball size.
lambda_penalty
Optional[float]
None
Penalty for each prototype. Encourages a lower number of prototypes to be selected. Corresponds to :math:\lambda
in the paper notation. If not specified, the default value is set to 1 / N
where N
is the size of the dataset to choose the prototype instances from, passed to the :py:meth:alibi.prototypes.protoselect.ProtoSelect.fit
method.
batch_size
int
10000000000
Batch size to be used for kernel matrix computation.
preprocess_fn
Optional[Callable[[.[typing.Union[list, numpy.ndarray]]], numpy.ndarray]]
None
Preprocessing function used for kernel matrix computation. The preprocessing function takes the input as a list
or a numpy
array and transforms it into a numpy
array which is then fed to the kernel_distance
function. The use of preprocess_fn
allows the method to be applied to any data modality.
verbose
bool
False
Whether to display progression bar while computing prototype points.
Methods
fit
fit
fit(X: Union[list, numpy.ndarray], y: Optional[numpy.ndarray] = None, Z: Union[list, numpy.ndarray, None] = None) -> alibi.prototypes.protoselect.ProtoSelect
X
Union[list, numpy.ndarray]
Dataset to be summarised.
y
Optional[numpy.ndarray]
None
Labels of the dataset X
to be summarised. The labels are expected to be represented as integers [0, 1, ..., L-1]
, where L
is the number of classes in the dataset X
.
Z
Union[list, numpy.ndarray, None]
None
Optional dataset to choose the prototypes from. If Z=None
, the prototypes will be selected from the dataset X
. Otherwise, if Z
is provided, the dataset to be summarised is still X
, but it is summarised by prototypes belonging to the dataset Z
.
Returns
Type:
alibi.prototypes.protoselect.ProtoSelect
summarise
summarise
summarise(num_prototypes: int = 1) -> alibi.api.interfaces.Explanation
num_prototypes
int
1
Maximum number of prototypes to be selected.
Returns
Type:
alibi.api.interfaces.Explanation
Functions
compute_prototype_importances
compute_prototype_importances
compute_prototype_importances(summary: alibi.api.interfaces.Explanation, trainset: Tuple[numpy.ndarray, numpy.ndarray], preprocess_fn: Optional[Callable[[.[<class 'numpy.ndarray'>]], numpy.ndarray]] = None, knn_kw: Optional[dict] = None) -> Dict[str, Optional[numpy.ndarray]]
Computes the importance of each prototype. The importance of a prototype is the number of assigned training instances correctly classified according to the 1-KNN classifier (Bien and Tibshirani (2012): https://arxiv.org/abs/1202.5933).
summary
alibi.api.interfaces.Explanation
An Explanation
object produced by a call to the :py:meth:alibi.prototypes.protoselect.ProtoSelect.summarise
method.
trainset
Tuple[numpy.ndarray, numpy.ndarray]
Tuple, (X_train, y_train)
, consisting of the training data instances with the corresponding labels.
preprocess_fn
Optional[Callable[[.[<class 'numpy.ndarray'>]], numpy.ndarray]]
None
Optional preprocessor function. If preprocess_fn=None
, no preprocessing is applied.
knn_kw
Optional[dict]
None
Keyword arguments passed to sklearn.neighbors.KNeighborsClassifier
. The n_neighbors
will be set automatically to 1, but the metric
has to be specified according to the kernel distance used. If the metric
is not specified, it will be set by default to 'euclidean'
. See parameters description: https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html
Returns
Type:
Dict[str, Optional[numpy.ndarray]]
cv_protoselect_euclidean
cv_protoselect_euclidean
cv_protoselect_euclidean(trainset: Tuple[numpy.ndarray, numpy.ndarray], protoset: Optional[Tuple[numpy.ndarray]] = None, valset: Optional[Tuple[numpy.ndarray, numpy.ndarray]] = None, num_prototypes: int = 1, eps_grid: Optional[numpy.ndarray] = None, quantiles: Optional[Tuple[float, float]] = None, grid_size: int = 25, n_splits: int = 2, batch_size: int = 10000000000, preprocess_fn: Optional[Callable[[.[<class 'numpy.ndarray'>]], numpy.ndarray]] = None, protoselect_kw: Optional[dict] = None, knn_kw: Optional[dict] = None, kfold_kw: Optional[dict] = None) -> dict
Cross-validation parameter selection for ProtoSelect
with Euclidean distance. The method computes the best epsilon radius.
trainset
Tuple[numpy.ndarray, numpy.ndarray]
Tuple, (X_train, y_train)
, consisting of the training data instances with the corresponding labels.
protoset
Optional[Tuple[numpy.ndarray]]
None
Tuple, (Z, )
, consisting of the dataset to choose the prototypes from. If Z
is not provided (i.e., protoset=None
), the prototypes will be selected from the training dataset X
. Otherwise, if Z
is provided, the dataset to be summarised is still X
, but it is summarised by prototypes belonging to the dataset Z
. Note that the argument is passed as a tuple with a single element for consistency reasons.
valset
Optional[Tuple[numpy.ndarray, numpy.ndarray]]
None
Optional tuple (X_val, y_val)
consisting of validation data instances with the corresponding validation labels. 1-KNN classifier is evaluated on the validation dataset to obtain the best epsilon radius. In case valset=None
, then n-splits
cross-validation is performed on the trainset
.
num_prototypes
int
1
The number of prototypes to be selected.
eps_grid
Optional[numpy.ndarray]
None
Optional grid of values to select the epsilon radius from. If not specified, the search grid is automatically proposed based on the inter-distances between X
and Z
. The distances are filtered by considering only values in between the quantiles
values. The minimum and maximum distance values are used to define the range of values to search the epsilon radius. The interval is discretized in grid_size
equidistant bins.
quantiles
Optional[Tuple[float, float]]
None
Quantiles, (q_min, q_max)
, to be used to filter the range of values of the epsilon radius. The expected quantile values are in [0, 1]
and clipped to [0, 1]
if outside the range. See eps_grid
for usage. If not specified, no filtering is applied. Only used if eps_grid=None
.
grid_size
int
25
The number of equidistant bins to be used to discretize the eps_grid
automatically proposed interval. Only used if eps_grid=None
.
n_splits
int
2
batch_size
int
10000000000
Batch size to be used for kernel matrix computation.
preprocess_fn
Optional[Callable[[.[<class 'numpy.ndarray'>]], numpy.ndarray]]
None
Preprocessing function to be applied to the data instance before applying the kernel.
protoselect_kw
Optional[dict]
None
Keyword arguments passed to :py:meth:alibi.prototypes.protoselect.ProtoSelect.__init__
.
knn_kw
Optional[dict]
None
Keyword arguments passed to sklearn.neighbors.KNeighborsClassifier
. The n_neighbors
will be set automatically to 1 and the metric
will be set to 'euclidean
. See parameters description: https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html
kfold_kw
Optional[dict]
None
Keyword arguments passed to sklearn.model_selection.KFold
. See parameters description: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.KFold.html
Returns
Type:
dict
visualize_image_prototypes
visualize_image_prototypes
visualize_image_prototypes(summary: alibi.api.interfaces.Explanation, trainset: Tuple[numpy.ndarray, numpy.ndarray], reducer: Callable[[.[<class 'numpy.ndarray'>]], numpy.ndarray], preprocess_fn: Optional[Callable[[.[<class 'numpy.ndarray'>]], numpy.ndarray]] = None, knn_kw: Optional[dict] = None, ax: Optional[matplotlib.axes._axes.Axes] = None, fig_kw: Optional[dict] = None, image_size: Tuple[int, int] = (28, 28), zoom_lb: float = 1.0, zoom_ub: float = 3.0) -> matplotlib.axes._axes.Axes
Plot the images of the prototypes at the location given by the reducer
representation. The size of each prototype is proportional to the logarithm of the number of assigned training instances correctly classified according to the 1-KNN classifier (Bien and Tibshirani (2012): https://arxiv.org/abs/1202.5933).
summary
alibi.api.interfaces.Explanation
An Explanation
object produced by a call to the :py:meth:alibi.prototypes.protoselect.ProtoSelect.summarise
method.
trainset
Tuple[numpy.ndarray, numpy.ndarray]
Tuple, (X_train, y_train)
, consisting of the training data instances with the corresponding labels.
reducer
Callable[[.[<class 'numpy.ndarray'>]], numpy.ndarray]
2D reducer. Reduces the input feature representation to 2D. Note that the reducer operates directly on the input instances if preprocess_fn=None
. If the preprocess_fn
is specified, the reducer will be called on the feature representation obtained after passing the input instances through the preprocess_fn
.
preprocess_fn
Optional[Callable[[.[<class 'numpy.ndarray'>]], numpy.ndarray]]
None
Optional preprocessor function. If preprocess_fn=None
, no preprocessing is applied.
knn_kw
Optional[dict]
None
Keyword arguments passed to sklearn.neighbors.KNeighborsClassifier
. The n_neighbors
will be set automatically to 1, but the metric
has to be specified according to the kernel distance used. If the metric
is not specified, it will be set by default to 'euclidean'
. See parameters description: https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html
ax
Optional[matplotlib.axes._axes.Axes]
None
A matplotlib
axes object to plot on.
fig_kw
Optional[dict]
None
Keyword arguments passed to the fig.set
function.
image_size
Tuple[int, int]
(28, 28)
Shape to which the prototype images will be resized. A zoom of 1 will display the image having the shape image_size
.
zoom_lb
float
1.0
Zoom lower bound. The zoom will be scaled linearly between [zoom_lb, zoom_ub]
.
zoom_ub
float
3.0
Zoom upper bound. The zoom will be scaled linearly between [zoom_lb, zoom_ub]
.
Returns
Type:
matplotlib.axes._axes.Axes
Last updated
Was this helpful?