# Canary Promotion

In this demo, we will:

* Deploy a pretrained SKLearn classification model based on the Iris dataset
* Load test the model with prediction data
* Observe the prediction requests and their responses
* Observe the utilization metrics for the model
* Deploy a canary XGBoost model
* Load test canary model with prediction data
* Observe and compare the prediction requests and metrics for both models
* Promote the canary model

### Iris Model

You can use SKLearn classification model based on the well-known [Iris dataset](https://scikit-learn.org/stable/datasets/toy_dataset.html#iris-plants-dataset). This dataset includes 150 samples of iris flowers, each with 4 features: sepal length, sepal width, petal length, and petal width, measured in centimeters. The samples are labeled by iris species—setosa, versicolor, and virginica—with an even distribution across these classes.

## Launch a Seldon Deployment

1. In the `Overview` page, click `Create new deployment`.
2. In the `Deployment Creation Wizard`, enter the following deployment details and click `Next`:
   * Name: *iris-classifier*
   * Namespace: *seldon*
   * Type: *Seldon Deployment*

![Deployment details](/files/7ANLigbLGY1Z9wzLo9rF)

3. Configure the default predictor as follows and click `Next`:
   * Runtime: *Scikit Learn*
   * Model URI: *gs\://seldon-models/scv2/samples/mlserver\_1.6.0/iris-sklearn*
   * Model Project: *default*
   * Storage Secret: *(leave blank/none)*
   * Model Name: *iris*

{% hint style="warning" %}
The `Model Name` is linked to the name described in the `model-settings.json` file, located in the Google Cloud Storage location. Changing the name in the JSON file would also require changing the `Model Name`, and vice versa.
{% endhint %}

![Default predictor spec](/files/6mz9NAh3IX0zGIdZIh5x)

4. Click `Next` for the remaining steps, then click `Launch`.
5. When your deployment is launched successfully, the status reads `Available`.

## Start Load Test

1. Once the deployment is in an `Available` status, navigate to its `Dashboard` page by clicking on it.
2. In the `Dashboard` page, scroll down to find the `Requests Monitor` section and click `Start a load test` with the following details:
   * Connections(total): *1*
   * Load Parameter: *Duration(seconds)*
   * Value: *120*
   * JSON payload:

```json
{
  "inputs": [
    {
      "name": "predict",
      "data": [
        0.38606369295833043,
        0.006894049558299753,
        0.6104082981607108,
        0.3958954239450676
      ],
      "datatype": "FP64",
      "shape": [
        1,
        4
      ]
    }
  ]
}
```

![Load Test Wizard](/files/AlsG3gQYZQHuJkODYIVF)

This will create a Kubernetes Job that will be sending prediction requests for the specified seconds to the SKLearn model in the deployment.

## Observe the prediction requests and their responses

After the load test has started, you can monitor the upcoming requests and their responses by navigating to the `Requests` page of the deployment.

![Requests page showing prediction requests and their responses](/files/IQCcUlh1rq1C8F6qHXit)

You can also monitor live requests metrics resulting from the load test if you navigate back to the `Dashboard` page of the deployment, and scroll down to the `Live Requests` section. In this screenshot you can see the number of requests per second and the average latency of the model.

![Live Requests section showing the number of requests per second and the average latency of the model](/files/c1qJXE9ncBd2oimYEymK)

## Observe the utilization metrics for the model

Furthermore, you can monitor the utilization metrics for the model in the `Dashboard` page of the deployment. Scroll down to the `Resource Monitor` section to see the CPU and memory utilization of the model.

![Resource Monitor section showing the CPU and memory utilization of the model](/files/XJtVLDwVp616IDh3vrq5)

## Deploy a Canary model

The next step is to create an XGBoost canary model that shares a percentage of the traffic with the main model.

1. Navigate to the `Dashboard` of the deployment and click `Add Canary`.
2. In the `Update Deployment Wizard`, configure the default predictor as follows:
   * Runtime: *XGBoost*
   * Model URI: *gs\://seldon-models/xgboost/iris*
   * Model Project: *default*
   * Storage Secret: *(leave blank/none)*
   * Canary Traffic Percentage: *10*
   * Model Name: *iris*

{% hint style="warning" %}
The `Model Name` is linked to the name described in the `model-settings.json` file, located in the Google Cloud Storage location. Changing the name in the JSON file would also require changing the `Model Name`, and vice versa.
{% endhint %}

![Canary Deployment Wizard](/files/HUO8yGSwFYQUUTTOQid3)

3. Click on `Next` for the remaining steps, then click `Launch`.
4. While the canary model is being launched, the deployment status changes to an `Updating` state.
5. When the canary model is launched successfully, the deployment status becomes `Available`.

Create a new canary deployment with the XGBoost model and roughly 10% of the traffic is sent to it.

{% hint style="info" %}
**Note**: The deployment status represents the overal status of the deployment, including the main and canary models.
{% endhint %}

## Load test the canary model

This time, we will create a new load test with the canary model running and observe the requests and metrics for both models. You can use either the same JSON payload from the previous load test or construct a new one with different values or number of predictions.

{% hint style="warning" %}
Remember that roughly 10% of the traffic is sent to the canary model. If, however, the canary model is not available, all the traffic is sent to the main model.
{% endhint %}

## Observe the prediction requests and their responses for both models

After the second load test has started, you can monitor the upcoming requests and their responses by navigating to the `Requests` page of the deployment. Since we have two models running, you can choose to filter the requests by the model name to see the requests and responses for each model. In order to see the requests and responses for the canary model, you can filter the requests by clicking on the reverse pyramid icon, then click on the `Node Selector` dropdown, and, finally, select the canary predictor.

![Requests page showing prediction requests and their responses for the canary model](/files/7baYHTXlDhehrCDRseYA)

You can also monitor live requests metrics for the both models if you navigate back to the `Dashboard` page of the deployment, and scroll down to the `Live Requests` section. In this screenshot you can see the number of requests per second and the average latency for both models. As expected, the main model is receiving more requests than the canary model, so the number of requests per second for the main model is higher.

![Live Requests section showing the number of requests per second and the average latency for both models](/files/YFi2bfSg9i0FsaZlhknC)

## Observe the utilization metrics for both models

Furthermore, you can monitor the utilization metrics for both models in the `Dashboard` page of the deployment. Scroll down to the `Resource Monitor` section to see the CPU and memory utilization for both models.

![Resource Monitor section showing the CPU and memory utilization of the model](/files/qayq3fYXCVqswlUssJGU)

## Promote the Canary model

Great! Now we have observed the requests and metrics for both models. If we are happy with how the canary model is performing, we can promote it to become the main model.

1. Navigate to the `Dashboard` of the deployment and click on the `Promote Canary` button.
2. In the `Promote Canary` dialog, click `Confirm` to promote the canary model to the main model.
3. If the canary model is promoted successfully, the deployment status will become `Available`.

![The canary model has been successfully promoted to be the main model of the deployment](/files/uXO50Aa3VfphWe1Tj7p5)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.seldon.ai/seldon-enterprise-platform/demos/seldon-core-v1/canary-promotion.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
