# White-box and black-box models

Explainer algorithms can be categorised in many ways (see [this table](https://docs.seldon.ai/alibi-explain/algorithms#model-explanations)), but perhaps the most fundamental one is whether they work with **white-box** or **black-box** models.

**White-box** is a term used for any model that the explainer method can “look inside” and manipulate arbitrarily. In the context of `alibi` this category of models corresponds to Python objects that represent models, for example instances of `sklearn.base.BaseEstimator`, `tensorflow.keras.Model`, `torch.nn.Module` etc. The exact type of the white-box model in question enables different white-box explainer methods. For example, `tensorflow` and `torch` models are *gradient-based* which enables *gradient-based* explainer methods such as [Integrated Gradients](https://github.com/ramonpzg/alibi/blob/rp-alibi-newdocs-dec23/doc/source/methods/IntegratedGradients.ipynb) whilst various types of tree-based models are supported by [TreeShap](https://github.com/ramonpzg/alibi/blob/rp-alibi-newdocs-dec23/doc/source/methods/TreeSHAP.ipynb).

On the other hand, a **black-box** model describes any model that the explainer method may not inspect and modify arbitrarily. The only interaction with the model is via calling its `predict` function (or similar) on data and receiving predictions back. In the context of `alibi` black-box models have a concrete definition—they are functions that take in a `numpy` array representing data and return a `numpy` array representing a prediction. Using [type hints](https://docs.python.org/3/library/typing.html) we can define a general black-box model (also referred to as a *prediction function*) to be of type `Callable[[np.ndarray], np.ndarray]` . Explainers that expect black-box models as input are very flexible as *any* type of function that conforms to the expected type can be explained by black-box explainers.

{% hint style="info" %}
**Note**:In addition to the expected type, black-box models **must** be compatible with batch prediction. I.e. `alibi` explainers assume that the first dimension of the input array is always batch.
{% endhint %}

{% hint style="warning" %}
**Warning**: There is currently one exception to the black-box interface: the `AnchorText` explainer expects the prediction function to be of type `Callable[[List[str], np.ndarray]`, i.e. the model is expected to work on batches of raw text (here `List[str]` indicates a batch of text strings). See [this example](https://github.com/SeldonIO/alibi/blob/master/docs-gb/source/examples/anchor_text_movie.ipynb) for more information.
{% endhint %}

## Wrapping white-box models into black-box models

Models in Python all start out as white-box models (i.e. custom Python objects from some modelling library like `sklearn` or `tensorflow`). However, to be used with explainers that expect a black-box prediction function, the user has to define a prediction function that conforms to the black-box definition given above. Here we give a few common examples and some pointers about creating a black-box prediction function from a white-box model. In what follows we distinguish between the original white-box `model` and the wrapped black-box `predictor` function.

### Scikit-learn models

All `sklearn` models expose a `predict` method that already conforms to the black-box function interface defined above which makes it easy to create black-box predictors:

```python
predictor = model.predict
explainer = SomeExplainer(predictor, **kwargs)
```

In some cases for classifiers it may be more appropriate to expose the `predict_proba` or `decision_function` method instead of `predict`, see an example on [ALE for classifiers](https://github.com/ramonpzg/alibi/blob/rp-alibi-newdocs-dec23/doc/source/examples/ale_classification.ipynb).

### Tensorflow models

Tensorflow models (specifically instances of `tensorflow.keras.Model`) expose a `predict` method that takes in `numpy` arrays and returns predictions as `numpy` arrays :

```python
predictor = model.predict
explainer = SomeExplainer(predictor), **kwargs)
```

### Pytorch models

Pytorch models (specifically instances of `torch.nn.Module`) expect and return instances of `torch.Tensor` from the `forward` method, thus we need to do a bit more work to define the `predictor` black-box function:

```python
model.eval()

@torch.no_grad()
def predictor(X: np.ndarray) -> np.ndarray:
    X = torch.as_tensor(X, dtype=dtype, device=device)
    return model.forward(X).cpu().numpy()
```

Note that there are a few differences with `tensorflow` models:

* Ensure the model is in the evaluation mode (i.e., `model.eval()`) and that the mode does not change to training (i.e., `model.train()`) between consecutive calls to the explainer. Otherwise consider including `model.eval()` inside the `predictor` function.
* Decorate the `predictor` with `@torch.no_grad()` to avoid the computation and storage of the gradients which are not needed.
* Explicit conversion to a tensor with a specific `dtype`. Whilst `tensorflow` handles this internally when `predict` is called, for `torch` we need to do this manually.
* Explicit device selection for the tensor. This is an important step as `numpy` arrays are limited to cpu and if your model is on a gpu it will expect its input tensors to be on a gpu.
* Explicit conversion of prediction tensor to `numpy`. We first send the output to the cpu and then transform into `numpy` array.

If you are using [Pytorch Lightning](https://www.pytorchlightning.ai) to create `torch` models, then the `dtype` and `device` can be retrieved as attributes of your `LightningModule`, see [here](https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html).

### General models

Given the above examples, the pattern for defining a black-box predictor from a white-box model is clear: define a `predictor` function that manipulates the inputs to and outputs from the underlying model in a way that conforms to the black-box model interface `alibi` expects:

```python
def predictor(X: np.ndarray) -> np.ndarray:
    inp = transform_input(X)
    output = model(inp) # or call the model-specific prediction method
    output = transform_output(output)
    return output

explainer = SomeExplainer(predictor, **kwargs)
```

Here `transform_input` and `transform_output` are general user-defined functions that appropriately transform the input `numpy` arrays in a format that the model expects and transform the output predictions into a `numpy` array so that `predictor` is an `alibi` compatible black-box function.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.seldon.ai/alibi-explain/overview/white_box_black_box.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
