Canary Promotion
Last updated
Last updated
In this demo, we will:
Deploy a pretrained SKLearn classification model based on the Iris dataset
Load test the model with prediction data
Observe the prediction requests and their responses
Observe the utilization metrics for the model
Deploy a canary XGBoost model
Load test canary model with prediction data
Observe and compare the prediction requests and metrics for both models
Promote the canary model
In the Overview
page, click Create new deployment
.
In the Deployment Creation Wizard
, enter the following deployment details and click Next
:
Name: iris-classifier
Namespace: seldon
Type: Seldon Deployment
Configure the default predictor as follows and click Next
:
Runtime: Scikit Learn
Model URI: gs://seldon-models/scv2/samples/mlserver_1.6.0/iris-sklearn
Model Project: default
Storage Secret: (leave blank/none)
Model Name: iris
Click Next
for the remaining steps, then click Launch
.
When your deployment is launched successfully, the status reads Available
.
Once the deployment is in an Available
status, navigate to its Dashboard
page by clicking on it.
In the Dashboard
page, scroll down to find the Requests Monitor
section and click Start a load test
with the following details:
Connections(total): 1
Load Parameter: Duration(seconds)
Value: 120
JSON payload:
This will create a Kubernetes Job that will be sending prediction requests for the specified seconds to the SKLearn model in the deployment.
After the load test has started, you can monitor the upcoming requests and their responses by navigating to the Requests
page of the deployment.
You can also monitor live requests metrics resulting from the load test if you navigate back to the Dashboard
page of the deployment, and scroll down to the Live Requests
section. In this screenshot you can see the number of requests per second and the average latency of the model.
Furthermore, you can monitor the utilization metrics for the model in the Dashboard
page of the deployment. Scroll down to the Resource Monitor
section to see the CPU and memory utilization of the model.
The next step is to create an XGBoost canary model that shares a percentage of the traffic with the main model.
Navigate to the Dashboard
of the deployment and click Add Canary
.
In the Update Deployment Wizard
, configure the default predictor as follows:
Runtime: XGBoost
Model URI: gs://seldon-models/xgboost/iris
Model Project: default
Storage Secret: (leave blank/none)
Canary Traffic Percentage: 10
Model Name: iris
Click on Next
for the remaining steps, then click Launch
.
While the canary model is being launched, the deployment status changes to an Updating
state.
When the canary model is launched successfully, the deployment status becomes Available
.
Create a new canary deployment with the XGBoost model and roughly 10% of the traffic is sent to it.
Note: The deployment status represents the overal status of the deployment, including the main and canary models.
This time, we will create a new load test with the canary model running and observe the requests and metrics for both models. You can use either the same JSON payload from the previous load test or construct a new one with different values or number of predictions.
Remember that roughly 10% of the traffic is sent to the canary model. If, however, the canary model is not available, all the traffic is sent to the main model.
After the second load test has started, you can monitor the upcoming requests and their responses by navigating to the Requests
page of the deployment. Since we have two models running, you can choose to filter the requests by the model name to see the requests and responses for each model. In order to see the requests and responses for the canary model, you can filter the requests by clicking on the reverse pyramid icon, then click on the Node Selector
dropdown, and, finally, select the canary predictor.
You can also monitor live requests metrics for the both models if you navigate back to the Dashboard
page of the deployment, and scroll down to the Live Requests
section. In this screenshot you can see the number of requests per second and the average latency for both models. As expected, the main model is receiving more requests than the canary model, so the number of requests per second for the main model is higher.
Furthermore, you can monitor the utilization metrics for both models in the Dashboard
page of the deployment. Scroll down to the Resource Monitor
section to see the CPU and memory utilization for both models.
Great! Now we have observed the requests and metrics for both models. If we are happy with how the canary model is performing, we can promote it to become the main model.
Navigate to the Dashboard
of the deployment and click on the Promote Canary
button.
In the Promote Canary
dialog, click Confirm
to promote the canary model to the main model.
If the canary model is promoted successfully, the deployment status will become Available
.
The Model Name
is linked to the name described in the model-settings.json
file, located in the Google Cloud Storage location. Changing the name in the JSON file would also require changing the Model Name
, and vice versa.
The Model Name
is linked to the name described in the model-settings.json
file, located in the Google Cloud Storage location. Changing the name in the JSON file would also require changing the Model Name
, and vice versa.
You can use SKLearn classification model based on the well-known Iris dataset. This dataset includes 150 samples of iris flowers, each with 4 features: sepal length, sepal width, petal length, and petal width, measured in centimeters. The samples are labeled by iris species—setosa, versicolor, and virginica—with an even distribution across these classes.