Canary Promotion

In this demo, we will:

Deploy a pretrained SKLearn classification model based on the Iris dataset
Load test the model with prediction data
Observe the prediction requests and their responses
Observe the utilization metrics for the model
Deploy a canary XGBoost model
Load test canary model with prediction data
Observe and compare the prediction requests and metrics for both models
Promote the canary model

Iris Model

You can use SKLearn classification model based on the well-known Iris dataset. This dataset includes 150 samples of iris flowers, each with 4 features: sepal length, sepal width, petal length, and petal width, measured in centimeters. The samples are labeled by iris species—setosa, versicolor, and virginica—with an even distribution across these classes.

Launch a Seldon Deployment

In the Overview page, click Create new deployment.
In the Deployment Creation Wizard, enter the following deployment details and click Next:
- Name: iris-classifier
- Namespace: seldon
- Type: Seldon Deployment

Configure the default predictor as follows and click Next:
- Runtime: Scikit Learn
- Model URI: gs://seldon-models/scv2/samples/mlserver_1.6.0/iris-sklearn
- Model Project: default
- Storage Secret: (leave blank/none)
- Model Name: iris

The Model Name is linked to the name described in the model-settings.json file, located in the Google Cloud Storage location. Changing the name in the JSON file would also require changing the Model Name, and vice versa.

Click Next for the remaining steps, then click Launch.
When your deployment is launched successfully, the status reads Available.

Start Load Test

Once the deployment is in an Available status, navigate to its Dashboard page by clicking on it.
In the Dashboard page, scroll down to find the Requests Monitor section and click Start a load test with the following details:
- Connections(total): 1
- Load Parameter: Duration(seconds)
- Value: 120
- JSON payload:

{
  "inputs": [
    {
      "name": "predict",
      "data": [
        0.38606369295833043,
        0.006894049558299753,
        0.6104082981607108,
        0.3958954239450676
      ],
      "datatype": "FP64",
      "shape": [
        1,
        4
      ]
    }
  ]
}

This will create a Kubernetes Job that will be sending prediction requests for the specified seconds to the SKLearn model in the deployment.

Observe the prediction requests and their responses

After the load test has started, you can monitor the upcoming requests and their responses by navigating to the Requests page of the deployment.

You can also monitor live requests metrics resulting from the load test if you navigate back to the Dashboard page of the deployment, and scroll down to the Live Requests section. In this screenshot you can see the number of requests per second and the average latency of the model.

Observe the utilization metrics for the model

Furthermore, you can monitor the utilization metrics for the model in the Dashboard page of the deployment. Scroll down to the Resource Monitor section to see the CPU and memory utilization of the model.

Deploy a Canary model

The next step is to create an XGBoost canary model that shares a percentage of the traffic with the main model.

Navigate to the Dashboard of the deployment and click Add Canary.
In the Update Deployment Wizard, configure the default predictor as follows:
- Runtime: XGBoost
- Model URI: gs://seldon-models/xgboost/iris
- Model Project: default
- Storage Secret: (leave blank/none)
- Canary Traffic Percentage: 10
- Model Name: iris

Click on Next for the remaining steps, then click Launch.
While the canary model is being launched, the deployment status changes to an Updating state.
When the canary model is launched successfully, the deployment status becomes Available.

Create a new canary deployment with the XGBoost model and roughly 10% of the traffic is sent to it.

Note: The deployment status represents the overal status of the deployment, including the main and canary models.

Load test the canary model

This time, we will create a new load test with the canary model running and observe the requests and metrics for both models. You can use either the same JSON payload from the previous load test or construct a new one with different values or number of predictions.

Remember that roughly 10% of the traffic is sent to the canary model. If, however, the canary model is not available, all the traffic is sent to the main model.

Observe the prediction requests and their responses for both models

After the second load test has started, you can monitor the upcoming requests and their responses by navigating to the Requests page of the deployment. Since we have two models running, you can choose to filter the requests by the model name to see the requests and responses for each model. In order to see the requests and responses for the canary model, you can filter the requests by clicking on the reverse pyramid icon, then click on the Node Selector dropdown, and, finally, select the canary predictor.

You can also monitor live requests metrics for the both models if you navigate back to the Dashboard page of the deployment, and scroll down to the Live Requests section. In this screenshot you can see the number of requests per second and the average latency for both models. As expected, the main model is receiving more requests than the canary model, so the number of requests per second for the main model is higher.

Observe the utilization metrics for both models

Furthermore, you can monitor the utilization metrics for both models in the Dashboard page of the deployment. Scroll down to the Resource Monitor section to see the CPU and memory utilization for both models.

Promote the Canary model

Great! Now we have observed the requests and metrics for both models. If we are happy with how the canary model is performing, we can promote it to become the main model.

Navigate to the Dashboard of the deployment and click on the Promote Canary button.
In the Promote Canary dialog, click Confirm to promote the canary model to the main model.
If the canary model is promoted successfully, the deployment status will become Available.

PreviousBatch Prediction Jobs NextText Generation with Custom HuggingFace Model

Last updated 8 months ago

Was this helpful?