Drift Detection
When ML models are deployed in production, sometimes even minor changes in a data distribution can adversely affect the performance of ML models. When the input data distribution shifts then prediction quality can drop. It is important to track this drift. This demo is based on the mixed-type tabular data drift detection method in the alibi detect project for tabular datasets.
Here we will :
Launch an income classifier model based on demographic features from a 1996 US census. The data instances contain a person’s characteristics like age, marital status or education while the label represents whether the person makes more or less than $50k per year.
Setup a mixed-type tabular data drift detector for this particular model.
Make a batch of predictions over time.
Track the drift metrics in the Monitoring dashboard.
This demo requires Knative installation on the cluster as the drift detector will be installed as a kservice. See Knative installation instructions for necessary setup required.
Register an income classifier model
Register a pre-trained income classifier SKLearn model.
In the
Model Catalog
page, clickRegister a new model
:"Register a new model" button on the Model Catalog page In the
Register New Model
wizard, enter the following information, then clickRegister Model
:Model Name:
income-classifier
URI:
gs://seldon-models/scv2/samples/mlserver_1.6.0/income-sklearn/classifier/
Artifact Type:
SciKit Learn
Version:
v1
Model configuration wizard
Configure predictions schema for classifier
Edit the model metadata to update the prediction schema for the model. The prediction schema is a generic schema structure for machine learning model predictions. It is a definition of feature inputs and output targets from the model prediction. Learn more about the predictions schema at the ML Predictions Schema open source repository. Use the income classifier model predictions schema income-classifier-prediction-schema.json
to edit and save the model level metadata.
Click the model
income-classifier
that you registered.Select "income-classifier" model on the Model Catalog page Click
Edit Metadata
to update the Prediction schema field associated with the model using the contents of prediction schemaincome-classifier-prediction-schema.json
.Model's metadata wizard Click
Save Metadata
.
Launch a Seldon Deployment
Deploy the income classifier model from the catalog into an appropriate namespace
In the Model catalog, select Deploy of the Action dropdown.
"Deploy" action in the dropdown on the Model Catalog page Enter the deployment details in the deployment creation wizard and click
Next
:Name:
income-classifier
Namespace:
seldon
Type:
Seldon Deployment
Income classifier deployment details The predictor details should already be filled in from the model catalog except the Model Name field. Use
income
for the Model Name field.Model Name:
income
The
Model Name
is linked to the name described in themodel-settings.json
file, located in the Google Cloud Storage location. Changing the name in the JSON file would also require changing theModel Name
, and vice versa.

Click
Next
for the remaining steps, then clickLaunch
.
Add a Drift Detector
From the deployment overview page, select your deployment to enter the deployment dashboard. Inside the deployment dashboard, add a drift detector with by clicking the Add
button within the Drift Detection
widget.
Enter the following parameters in the modal popup which appears, to configure the detector:
Model Name:
income-drift
.Storage URI: (For public Google Buckets, secret field is optional)
gs://seldon-models/scv2/samples/mlserver_1.6.0/income-sklearn/drift-detector
Reply URL: Leave as the default value, shown below, unless you are using a custom installation, please change this parameter according to your installation
http://seldon-request-logger.seldon-logs
Minimum Batch Size:
200
Drift Type:
Feature
Then, click CREATE DETECTOR
to complete the setup.
Configure predictions schema for detector
As per the income classifier model, use the same model predictions schema income-classifier-prediction-schema.json
to edit and save the model level metadata for drift detector.
Click on the vertical ellipses “⋮” icon for the drift detector you have just registered.
select detector Click the
Configure Metadata
option to update the prediction schema associated with the modelPaste the downloaded
income-classifier-prediction-schema.json
, name the modelincome-drift
and clickSave Metadata
.configure prediction schema
Run Batch Predictions
From the deployment dashboard, click on
Batch Jobs
. Run a batch prediction job using the Open Inference Protocol (OIP) payload format text predictions data filedata.txt
.
This file has 4000 individual data points and based on our drift detector configuration, drift will be detected for a batch every 200
points. The distribution of the data in the first half section is the same as the distribution of the reference data the drift detector was configured with and the second half section of the data should be different to observe drift.
Upload the data to a bucket store of your choice. This demo will use MinIO and store the data at bucket path
minio://income-batch-data/data.txt
. Do not forget to configure your storage access credentials secret - we have it asminio-bucket-envvars
here. Refer to the batch request demo for an example of how this can be done via the minio browser.Running a batch job with the configuration below. This runs an offline job that makes a prediction request for a batch of 200 rows in the file at
minio://income-batch-data/data.txt
every5 seconds
:
Input Data Location: minio://income-batch-data/data.txt
Output Data Location: minio://income-batch-data/output-{{workflow.name}}.txt
Number of Workers: 1
Number of Retries: 3
Batch Size: 200
Minimum Batch Wait Interval (sec): 5
Method: Predict
Transport Protocol: REST
Input Data Type: Open Inference Protocol (OIP)
Storage Secret Name: minio-bucket-envvars
Monitor Drift Detection Metrics
Under the Monitor
section of your deployment navigation, on the Drift Detection
Tab, you can see a timeline of drift detection metrics.
The drift dashboard showcases 2 types of metrics graphs:
P-value score over time
1a. Zoomed in view, focusing on features that have drifted, i.e. features that have a p-value score of less than the threshold.
p-values drift metrics zoomed in 2b. Zoomed out view, showing all features.
p-values drift metrics zoomed out Distance score over time.

Note that for both drift metrics graphs, the starting batches do not drift and are marked by an empty O
symbol, while the later batches do drift and are marked by a filled O
symbol.
Monitor Drift Detection Alerts
If you have alerting configured you should see a notification about the drift

with further details present on the alerting log

Data drift and reference distributions comparison
To further analyse prediction data drift, you can also switch to the feature distribution tab to compare predictions to reference data distribution. See feature distribution monitoring demo for setup details.
Upload the income classifier reference dataset drift-reference-v2.csv
as the reference data to monitor data drift in terms of feature distributions.
Once reference data is available, you can compare the distributions of the prediction data to the reference data.
You can see when reference data is available by checking the button on the top left of the Distributions
dashboard. If it is not clickable and displays Reference data available
, then reference data is available.

For each feature, you can click on Toggle reference data
to view reference data side by side.

We will see that the drifted data has lower education individuals that were not in the reference data.
Troubleshooting
If you experience issues with this demo, see the troubleshooting docs and also the Knative or Elasticsearch sections.
Last updated
Was this helpful?