# Drift Detection

When ML models are deployed in production, sometimes even minor changes in a data distribution can adversely affect the performance of ML models. When the input data distribution shifts then prediction quality can drop. It is important to track this drift. This demo is based on the [mixed-type tabular data](https://docs.seldon.io/projects/alibi-detect/en/latest/cd/methods/tabulardrift.html) drift detection method in the alibi detect project for tabular datasets.

Here we will :

* Launch an income classifier model based on [demographic features from a 1996 US census](https://archive.ics.uci.edu/dataset/20/census+income). The data instances contain a person’s characteristics like age, marital status or education while the label represents whether the person makes more or less than $50k per year.
* Setup a mixed-type tabular data drift detector for this particular model.
* Make a batch of predictions over time.
* Track the drift metrics in the Monitoring dashboard.

{% hint style="warning" %}
This demo requires Knative installation on the cluster as the drift detector will be installed as a kservice. See [Knative installation instructions](/seldon-enterprise-platform/production-environment/knative.md) for necessary setup required.
{% endhint %}

## Register an income classifier model

Register a pre-trained income classifier SKLearn model.

1. In the `Model Catalog` page, click `Register a new model`:

   !["Register a new model" button on the Model Catalog page](/files/f40YduRjaeZW0pvnAmij)
2. In the `Register New Model` wizard, enter the following information, then click `Register Model`:

   * *Model Name*: `income-classifier`
   * *URI*: `gs://seldon-models/scv2/samples/mlserver_1.6.0/income-sklearn/classifier/`
   * *Artifact Type*: `SciKit Learn`
   * *Version*: `v1`

   ![Model configuration wizard](/files/JW50768QuvhqNKagvKha)

## Configure predictions schema for classifier

Edit the model metadata to update the prediction schema for the model. The prediction schema is a generic schema structure for machine learning model predictions. It is a definition of feature inputs and output targets from the model prediction. Learn more about the predictions schema at the [ML Predictions Schema](https://github.com/SeldonIO/ml-prediction-schema) open source repository. Use the income classifier model predictions schema `income-classifier-prediction-schema.json` to edit and save the model level metadata.

{% file src="/files/KarkDsbbSlDBd4oKaUuy" %}

1. Click the model `income-classifier` that you registered.

   ![Select "income-classifier" model on the Model Catalog page](/files/wp6bml9poTIBCtgT3G2h)
2. Click `Edit Metadata` to update the **Prediction schema** field associated with the model using the contents of prediction schema `income-classifier-prediction-schema.json`.

   ![Model's metadata wizard](/files/V1P3kl66ymxfPb1anFpx)
3. Click `Save Metadata`.

## Launch a Seldon Deployment

Deploy the income classifier model from the catalog into an appropriate namespace

1. In the **Model catalog**, select **Deploy** of the **Action** dropdown.

   !["Deploy" action in the dropdown on the Model Catalog page](/files/i0adewhXXW3T6A44aK8Z)
2. Enter the deployment details in the deployment creation wizard and click `Next`:

   * *Name*: `income-classifier`
   * *Namespace*: `seldon`
   * *Type*: `Seldon Deployment`

   ![Income classifier deployment details](/files/rrLmHj6CXlPAE12LQtN7)
3. The predictor details should already be filled in from the model catalog except the **Model Name** field. Use `income` for the **Model Name** field.

   * *Model Name*: `income`

   <div data-gb-custom-block data-tag="hint" data-style="warning" class="hint hint-warning"><p>The <code>Model Name</code> is linked to the name described in the <code>model-settings.json</code> file, located in the Google Cloud Storage location. Changing the name in the JSON file would also require changing the <code>Model Name</code>, and vice versa.</p></div>

![Income classifier deployment predictor](/files/wNHacbYPW2ZClgGK3O1d)

4. Click `Next` for the remaining steps, then click `Launch`.

## Add a Drift Detector

From the *deployment overview page*, select your deployment to enter the deployment dashboard. Inside the *deployment dashboard*, add a drift detector with by clicking the `Add` button within the `Drift Detection` widget.

<details>

<summary>Expand to see drift detector creation</summary>

<img src="/files/kFdL1dzco2zYu9h9CwkK" alt="configure drift detector" data-size="original">

<img src="/files/kInBxJw6VHQBKstAnNV6" alt="create drift detector" data-size="original">

</details>

Enter the following parameters in the modal popup which appears, to configure the detector:

* *Model Name*: `income-drift`.
* *Storage URI*: (For public Google Buckets, secret field is optional)

  ```
  gs://seldon-models/scv2/samples/mlserver_1.6.0/income-sklearn/drift-detector
  ```
* *Reply URL*: Leave as the default value, shown below, unless you are using a custom installation, please change this parameter according to your installation

  ```
  http://seldon-request-logger.seldon-logs
  ```
* *Minimum Batch Size*: `200`
* *Drift Type*: `Feature`

Then, click `CREATE DETECTOR` to complete the setup.

## Configure predictions schema for detector

As per the income classifier model, use the same model predictions schema `income-classifier-prediction-schema.json` to edit and save the model level metadata for drift detector.

1. Click on the vertical ellipses “⋮” icon for the drift detector you have just registered.

   ![select detector](/files/bmdOfyzfrXQ9LvaKYcgk)
2. Click the `Configure Metadata` option to update the prediction schema associated with the model
3. Paste the downloaded `income-classifier-prediction-schema.json`, name the model `income-drift` and click `Save Metadata`.

   ![configure prediction schema](/files/TFrVwm1zXQxAuBDHlQ9Z)

## Run Batch Predictions

1. From the deployment dashboard, click on `Batch Jobs`. Run a batch prediction job using the Open Inference Protocol (OIP) payload format text predictions data file `data.txt`.

{% file src="/files/UFM6st42ioQSBm87H64G" %}

This file has 4000 individual data points and based on our drift detector configuration, drift will be detected for a batch every `200` points. The distribution of the data in the first half section is the same as the distribution of the reference data the drift detector was configured with and the second half section of the data should be different to observe drift.

2. Upload the data to a bucket store of your choice. This demo will use [MinIO](/seldon-enterprise-platform/production-environment/minio.md) and store the data at bucket path `minio://income-batch-data/data.txt` . Do not forget to [configure your storage access credentials secret](/seldon-enterprise-platform/operations/storage-initializers.md#configuration) - we have it as `minio-bucket-envvars` here. Refer to the [batch request demo](/seldon-enterprise-platform/demos/seldon-core-v1/batch-requests.md#setup-input-data) for an example of how this can be done via the minio browser.
3. Running a batch job with the configuration below. This runs an offline job that makes a prediction request for a batch of 200 rows in the file at `minio://income-batch-data/data.txt` every `5 seconds`:

```
Input Data Location: minio://income-batch-data/data.txt
Output Data Location: minio://income-batch-data/output-{{workflow.name}}.txt
Number of Workers: 1
Number of Retries: 3
Batch Size: 200
Minimum Batch Wait Interval (sec): 5
Method: Predict
Transport Protocol: REST
Input Data Type: Open Inference Protocol (OIP)
Storage Secret Name: minio-bucket-envvars
```

## Monitor Drift Detection Metrics

Under the `Monitor` section of your deployment navigation, on the `Drift Detection` Tab, you can see a timeline of drift detection metrics.

The drift dashboard showcases 2 types of metrics graphs:

1. P-value score over time

   * 1a. Zoomed in view, focusing on features that have drifted, i.e. features that have a p-value score of less than the threshold.

   ![p-values drift metrics zoomed in](/files/hcziHkr4H5D8sFQrNUwH)

   * 2b. Zoomed out view, showing all features.

   ![p-values drift metrics zoomed out](/files/ZUCZr5d3923KZD8D56vs)
2. Distance score over time.

![distance score metrics](/files/MPmjz2j80ASbOMd0yrhT)

Note that for both drift metrics graphs, the starting batches do not drift and are marked by an **empty `O` symbol**, while the later batches do drift and are marked by a **filled `O` symbol**.

## Monitor Drift Detection Alerts

If you have [alerting configured](/seldon-enterprise-platform/production-environment/observability-alerting/alerting.md) you should see a notification about the drift

![alert notification](/files/dfpLtGrOKOjsh6VpLOy4)

with further details present on the alerting log

![alert notification](/files/rL80HkCQ0QjRUnzXPIBV)

## Data drift and reference distributions comparison

To further analyse prediction data drift, you can also switch to the feature distribution tab to compare predictions to reference data distribution. See [feature distribution monitoring](/seldon-enterprise-platform/demos/seldon-core-v1/distributions-monitoring.md#reference-data-distributions-comparison) demo for setup details.

Upload the income classifier reference dataset `drift-reference-v2.csv` as the reference data to monitor data drift in terms of feature distributions.

{% file src="/files/IcNJxsseB2BT9oVIeuyS" %}

Once reference data is available, you can compare the distributions of the prediction data to the reference data.

You can see when reference data is available by checking the button on the top left of the `Distributions` dashboard. If it is not clickable and displays `Reference data available`, then reference data is available.

![Reference data available](/files/QL9XUwNaIeg6eMPo6hWi)

For each feature, you can click on `Toggle reference data` to view reference data side by side.

![Monitor distributions](/files/Ox8eETapATGc9k1xjZWp)

We will see that the drifted data has lower education individuals that were not in the reference data.

## Troubleshooting

If you experience issues with this demo, see the [troubleshooting docs](/seldon-enterprise-platform/help-and-support.md) and also the [Knative](/seldon-enterprise-platform/production-environment/request-logging.md) or [Elasticsearch](/seldon-enterprise-platform/production-environment/elasticsearch.md) sections.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.seldon.ai/seldon-enterprise-platform/demos/seldon-core-v1/drift-detection.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
