1 of 22

Examples

Workflows

Seldon inference is built from atomic Model components. Models as shown here cover a wide range of artifacts including:

Core machine learning models, e.g. a PyTorch model.
Feature transformations that might be built with custom python code.
Drift detectors.
Outlier detectors.
Explainers
Adversarial detectors.

A typical workflow for a production machine learning setup might be as follows:

You create a Tensorflow model for your core application use case and test this model in isolation to validate.
You create SKLearn feature transformation component before your model to convert the input into the correct form for your model. You also create Drift and Outlier detectors using Seldon's open source Alibi-detect library and test these in isolation.
You join these components together into a Pipeline for the final production setup.

These steps are shown in the diagram below:

Examples & Tutorials

This section will provide some examples to allow operations with Seldon to be tested so you can run your own models, experiments, pipelines and explainers.

Models

Huggingface models
Model zoo
Artifact versions

Pipelines

Pipeline examples
Pipeline to pipeline examples

Explainers

Explainer examples

Servers

Custom Servers

Experiments

Local experiments
Experiment version examples

Making Inference Requests

Inference examples
Tritonclient examples
Batch Inference examples (kubernetes)
Batch Inference examples (local)

Misc

Checking Pipeline readiness
Local Overcommit

Further Kubernetes Examples

Kubernetes custerwide example

Advanced Examples

Huggingface speech to sentiment with explanations pipeline
Production image classifier with drift and outlier monitoring
Production income classifier with drift, outlier and explanations
Conditional pipeline with pandas query model
Kubernetes Server with PVC

Local examples

SKLearn Model

We use a simple sklearn iris classification model

cat ./models/sklearn-iris-gs.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: iris
spec:
  storageUri: "gs://seldon-models/scv2/samples/mlserver_1.3.5/iris-sklearn"
  requirements:
  - sklearn
  memory: 100Ki

Load the model

seldon model load -f ./models/sklearn-iris-gs.yaml

{}

Wait for the model to be ready

seldon model status iris -w ModelAvailable | jq -M .

{}

Do a REST inference call

seldon model infer iris \
  '{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}]}'

{
	"model_name": "iris_1",
	"model_version": "1",
	"id": "983bd95f-4b4d-4ff1-95b2-df9d6d089164",
	"parameters": {},
	"outputs": [
		{
			"name": "predict",
			"shape": [
				1,
				1
			],
			"datatype": "INT64",
			"parameters": {
				"content_type": "np"
			},
			"data": [
				2
			]
		}
	]
}

Do a gRPC inference call

seldon model infer iris --inference-mode grpc \
   '{"model_name":"iris","inputs":[{"name":"input","contents":{"fp32_contents":[1,2,3,4]},"datatype":"FP32","shape":[1,4]}]}' | jq -M .

{
  "modelName": "iris_1",
  "modelVersion": "1",
  "outputs": [
    {
      "name": "predict",
      "datatype": "INT64",
      "shape": [
        "1",
        "1"
      ],
      "parameters": {
        "content_type": {
          "stringParam": "np"
        }
      },
      "contents": {
        "int64Contents": [
          "2"
        ]
      }
    }
  ]
}

Unload the model

seldon model unload iris

Tensorflow Model

We run a simple tensorflow model. Note the requirements section specifying tensorflow.

cat ./models/tfsimple1.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: tfsimple1
spec:
  storageUri: "gs://seldon-models/triton/simple"
  requirements:
  - tensorflow
  memory: 100Ki

Load the model.

seldon model load -f ./models/tfsimple1.yaml

{}

Wait for the model to be ready.

seldon model status tfsimple1 -w ModelAvailable | jq -M .

{}

Get model metadata

seldon model metadata tfsimple1

{
	"name": "tfsimple1_1",
	"versions": [
		"1"
	],
	"platform": "tensorflow_graphdef",
	"inputs": [
		{
			"name": "INPUT0",
			"datatype": "INT32",
			"shape": [
				-1,
				16
			]
		},
		{
			"name": "INPUT1",
			"datatype": "INT32",
			"shape": [
				-1,
				16
			]
		}
	],
	"outputs": [
		{
			"name": "OUTPUT0",
			"datatype": "INT32",
			"shape": [
				-1,
				16
			]
		},
		{
			"name": "OUTPUT1",
			"datatype": "INT32",
			"shape": [
				-1,
				16
			]
		}
	]
}

Do a REST inference call.

seldon model infer tfsimple1 \
    '{"inputs":[{"name":"INPUT0","data":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16],"datatype":"INT32","shape":[1,16]},{"name":"INPUT1","data":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16],"datatype":"INT32","shape":[1,16]}]}' | jq -M .

{
  "model_name": "tfsimple1_1",
  "model_version": "1",
  "outputs": [
    {
      "name": "OUTPUT0",
      "datatype": "INT32",
      "shape": [
        1,
        16
      ],
      "data": [
        2,
        4,
        6,
        8,
        10,
        12,
        14,
        16,
        18,
        20,
        22,
        24,
        26,
        28,
        30,
        32
      ]
    },
    {
      "name": "OUTPUT1",
      "datatype": "INT32",
      "shape": [
        1,
        16
      ],
      "data": [
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0
      ]
    }
  ]
}

Do a gRPC inference call

seldon model infer tfsimple1 --inference-mode grpc \
    '{"model_name":"tfsimple1","inputs":[{"name":"INPUT0","contents":{"int_contents":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]},"datatype":"INT32","shape":[1,16]},{"name":"INPUT1","contents":{"int_contents":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]},"datatype":"INT32","shape":[1,16]}]}' | jq -M .

{
  "modelName": "tfsimple1_1",
  "modelVersion": "1",
  "outputs": [
    {
      "name": "OUTPUT0",
      "datatype": "INT32",
      "shape": [
        "1",
        "16"
      ],
      "contents": {
        "intContents": [
          2,
          4,
          6,
          8,
          10,
          12,
          14,
          16,
          18,
          20,
          22,
          24,
          26,
          28,
          30,
          32
        ]
      }
    },
    {
      "name": "OUTPUT1",
      "datatype": "INT32",
      "shape": [
        "1",
        "16"
      ],
      "contents": {
        "intContents": [
          0,
          0,
          0,
          0,
          0,
          0,
          0,
          0,
          0,
          0,
          0,
          0,
          0,
          0,
          0,
          0
        ]
      }
    }
  ]
}

Unload the model

seldon model unload tfsimple1

Experiment

We will use two SKlearn Iris classification models to illustrate an experiment.

cat ./models/sklearn1.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: iris
spec:
  storageUri: "gs://seldon-models/mlserver/iris"
  requirements:
  - sklearn

cat ./models/sklearn2.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: iris2
spec:
  storageUri: "gs://seldon-models/mlserver/iris"
  requirements:
  - sklearn

Load both models.

seldon model load -f ./models/sklearn1.yaml
seldon model load -f ./models/sklearn2.yaml

{}
{}

Wait for both models to be ready.

seldon model status iris | jq -M .
seldon model status iris2 | jq -M .

{
  "modelName": "iris",
  "versions": [
    {
      "version": 1,
      "serverName": "mlserver",
      "kubernetesMeta": {},
      "modelReplicaState": {
        "0": {
          "state": "Available",
          "lastChangeTimestamp": "2023-06-29T14:01:41.362720538Z"
        }
      },
      "state": {
        "state": "ModelAvailable",
        "availableReplicas": 1,
        "lastChangeTimestamp": "2023-06-29T14:01:41.362720538Z"
      },
      "modelDefn": {
        "meta": {
          "name": "iris",
          "kubernetesMeta": {}
        },
        "modelSpec": {
          "uri": "gs://seldon-models/mlserver/iris",
          "requirements": [
            "sklearn"
          ]
        },
        "deploymentSpec": {
          "replicas": 1
        }
      }
    }
  ]
}
{
  "modelName": "iris2",
  "versions": [
    {
      "version": 1,
      "serverName": "mlserver",
      "kubernetesMeta": {},
      "modelReplicaState": {
        "0": {
          "state": "Available",
          "lastChangeTimestamp": "2023-06-29T14:01:41.362845079Z"
        }
      },
      "state": {
        "state": "ModelAvailable",
        "availableReplicas": 1,
        "lastChangeTimestamp": "2023-06-29T14:01:41.362845079Z"
      },
      "modelDefn": {
        "meta": {
          "name": "iris2",
          "kubernetesMeta": {}
        },
        "modelSpec": {
          "uri": "gs://seldon-models/mlserver/iris",
          "requirements": [
            "sklearn"
          ]
        },
        "deploymentSpec": {
          "replicas": 1
        }
      }
    }
  ]
}

Create an experiment that modifies the iris model to add a second model splitting traffic 50/50 between the two.

cat ./experiments/ab-default-model.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Experiment
metadata:
  name: experiment-sample
spec:
  default: iris
  candidates:
  - name: iris
    weight: 50
  - name: iris2
    weight: 50

Start the experiment.

seldon experiment start -f ./experiments/ab-default-model.yaml

Wait for the experiment to be ready.

seldon experiment status experiment-sample -w | jq -M .

{
  "experimentName": "experiment-sample",
  "active": true,
  "candidatesReady": true,
  "mirrorReady": true,
  "statusDescription": "experiment active",
  "kubernetesMeta": {}
}

Run a set of calls and record which route the traffic took. There should be roughly a 50/50 split.

seldon model infer iris -i 100 \
  '{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}]}'

Success: map[:iris2_1::57 :iris_1::43]

Run one more request

seldon model infer iris \
  '{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}]}'

{
	"model_name": "iris_1",
	"model_version": "1",
	"id": "fa425bdf-737c-41fe-894d-58868f70fe5d",
	"parameters": {},
	"outputs": [
		{
			"name": "predict",
			"shape": [
				1,
				1
			],
			"datatype": "INT64",
			"parameters": {
				"content_type": "np"
			},
			"data": [
				2
			]
		}
	]
}

Use sticky session key passed by last infer request to ensure same route is taken each time. We will test REST and gRPC.

seldon model infer iris -s -i 50 \
  '{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}]}'

Success: map[:iris_1::50]

seldon model infer iris --inference-mode grpc -s -i 50\
   '{"model_name":"iris","inputs":[{"name":"input","contents":{"fp32_contents":[1,2,3,4]},"datatype":"FP32","shape":[1,4]}]}'

Success: map[:iris_1::50]

Stop the experiment

seldon experiment stop experiment-sample

Show the requests all go to original model now.

seldon model infer iris -i 100 \
  '{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}]}'

Success: map[:iris_1::100]

Unload both models.

seldon model unload iris
seldon model unload iris2

Kubernetes examples

Note: The Seldon CLI allows you to view information about underlying Seldon resources and make changes to them through the scheduler in non-Kubernetes environments. However, it cannot modify underlying manifests within a Kubernetes cluster. Therefore, using the Seldon CLI for control plane operations in a Kubernetes environment is not recommended. For more details, see Seldon CLI.

Seldon V2 Kubernetes Examples

import os
os.environ["NAMESPACE"] = "seldon-mesh"

MESH_IP=!kubectl get svc seldon-mesh -n ${NAMESPACE} -o jsonpath='{.status.loadBalancer.ingress[0].ip}'
MESH_IP=MESH_IP[0]
import os
os.environ['MESH_IP'] = MESH_IP
MESH_IP

'172.18.255.2'

Model

cat ./models/sklearn-iris-gs.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: iris
spec:
  storageUri: "gs://seldon-models/scv2/samples/mlserver_1.3.5/iris-sklearn"
  requirements:
  - sklearn
  memory: 100Ki

kubectl create -f ./models/sklearn-iris-gs.yaml -n ${NAMESPACE}

model.mlops.seldon.io/iris created

kubectl wait --for condition=ready --timeout=300s model --all -n ${NAMESPACE}

model.mlops.seldon.io/iris condition met

kubectl get model iris -n ${NAMESPACE} -o jsonpath='{.status}' | jq -M .

{
  "conditions": [
    {
      "lastTransitionTime": "2023-06-30T10:01:52Z",
      "message": "ModelAvailable",
      "status": "True",
      "type": "ModelReady"
    },
    {
      "lastTransitionTime": "2023-06-30T10:01:52Z",
      "status": "True",
      "type": "Ready"
    }
  ],
  "replicas": 1
}

seldon model infer iris --inference-host ${MESH_IP}:80 \
  '{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}]}'

{
	"model_name": "iris_1",
	"model_version": "1",
	"id": "7fd401e1-3dce-46f5-9668-902aea652b89",
	"parameters": {},
	"outputs": [
		{
			"name": "predict",
			"shape": [
				1,
				1
			],
			"datatype": "INT64",
			"parameters": {
				"content_type": "np"
			},
			"data": [
				2
			]
		}
	]
}

seldon model infer iris --inference-mode grpc --inference-host ${MESH_IP}:80 \
   '{"model_name":"iris","inputs":[{"name":"input","contents":{"fp32_contents":[1,2,3,4]},"datatype":"FP32","shape":[1,4]}]}' | jq -M .

{
  "modelName": "iris_1",
  "modelVersion": "1",
  "outputs": [
    {
      "name": "predict",
      "datatype": "INT64",
      "shape": [
        "1",
        "1"
      ],
      "parameters": {
        "content_type": {
          "stringParam": "np"
        }
      },
      "contents": {
        "int64Contents": [
          "2"
        ]
      }
    }
  ]
}

kubectl get server mlserver -n ${NAMESPACE} -o jsonpath='{.status}' | jq -M .

{
  "conditions": [
    {
      "lastTransitionTime": "2023-06-30T09:59:12Z",
      "status": "True",
      "type": "Ready"
    },
    {
      "lastTransitionTime": "2023-06-30T09:59:12Z",
      "reason": "StatefulSet replicas matches desired replicas",
      "status": "True",
      "type": "StatefulSetReady"
    }
  ],
  "loadedModels": 1,
  "replicas": 1,
  "selector": "seldon-server-name=mlserver"
}

kubectl delete -f ./models/sklearn-iris-gs.yaml -n ${NAMESPACE}

model.mlops.seldon.io "iris" deleted

Experiment

cat ./models/sklearn1.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: iris
spec:
  storageUri: "gs://seldon-models/mlserver/iris"
  requirements:
  - sklearn

cat ./models/sklearn2.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: iris2
spec:
  storageUri: "gs://seldon-models/mlserver/iris"
  requirements:
  - sklearn

kubectl create -f ./models/sklearn1.yaml -n ${NAMESPACE}
kubectl create -f ./models/sklearn2.yaml -n ${NAMESPACE}

model.mlops.seldon.io/iris created
model.mlops.seldon.io/iris2 created

kubectl wait --for condition=ready --timeout=300s model --all -n ${NAMESPACE}

model.mlops.seldon.io/iris condition met
model.mlops.seldon.io/iris2 condition met

cat ./experiments/ab-default-model.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Experiment
metadata:
  name: experiment-sample
spec:
  default: iris
  candidates:
  - name: iris
    weight: 50
  - name: iris2
    weight: 50

kubectl create -f ./experiments/ab-default-model.yaml -n ${NAMESPACE}

experiment.mlops.seldon.io/experiment-sample created

kubectl wait --for condition=ready --timeout=300s experiment --all -n ${NAMESPACE}

experiment.mlops.seldon.io/experiment-sample condition met

seldon model infer --inference-host ${MESH_IP}:80 -i 50 iris \
  '{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}]}'

Success: map[:iris2_1::29 :iris_1::21]

kubectl delete -f ./experiments/ab-default-model.yaml -n ${NAMESPACE}
kubectl delete -f ./models/sklearn1.yaml -n ${NAMESPACE}
kubectl delete -f ./models/sklearn2.yaml -n ${NAMESPACE}

experiment.mlops.seldon.io "experiment-sample" deleted
model.mlops.seldon.io "iris" deleted
model.mlops.seldon.io "iris2" deleted

Pipeline - model chain

cat ./models/tfsimple1.yaml
cat ./models/tfsimple2.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: tfsimple1
spec:
  storageUri: "gs://seldon-models/triton/simple"
  requirements:
  - tensorflow
  memory: 100Ki
apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: tfsimple2
spec:
  storageUri: "gs://seldon-models/triton/simple"
  requirements:
  - tensorflow
  memory: 100Ki

kubectl create -f ./models/tfsimple1.yaml -n ${NAMESPACE}
kubectl create -f ./models/tfsimple2.yaml -n ${NAMESPACE}

model.mlops.seldon.io/tfsimple1 created
model.mlops.seldon.io/tfsimple2 created

kubectl wait --for condition=ready --timeout=300s model --all -n ${NAMESPACE}

model.mlops.seldon.io/tfsimple1 condition met
model.mlops.seldon.io/tfsimple2 condition met

cat ./pipelines/tfsimples.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Pipeline
metadata:
  name: tfsimples
spec:
  steps:
    - name: tfsimple1
    - name: tfsimple2
      inputs:
      - tfsimple1
      tensorMap:
        tfsimple1.outputs.OUTPUT0: INPUT0
        tfsimple1.outputs.OUTPUT1: INPUT1
  output:
    steps:
    - tfsimple2

kubectl create -f ./pipelines/tfsimples.yaml -n ${NAMESPACE}

pipeline.mlops.seldon.io/tfsimples created

kubectl wait --for condition=ready --timeout=300s pipeline --all -n ${NAMESPACE}

pipeline.mlops.seldon.io/tfsimples condition met

seldon pipeline infer tfsimples --inference-mode grpc --inference-host ${MESH_IP}:80 \
    '{"model_name":"simple","inputs":[{"name":"INPUT0","contents":{"int_contents":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]},"datatype":"INT32","shape":[1,16]},{"name":"INPUT1","contents":{"int_contents":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]},"datatype":"INT32","shape":[1,16]}]}' | jq -M .

{
  "outputs": [
    {
      "name": "OUTPUT0",
      "datatype": "INT32",
      "shape": [
        "1",
        "16"
      ],
      "contents": {
        "intContents": [
          2,
          4,
          6,
          8,
          10,
          12,
          14,
          16,
          18,
          20,
          22,
          24,
          26,
          28,
          30,
          32
        ]
      }
    },
    {
      "name": "OUTPUT1",
      "datatype": "INT32",
      "shape": [
        "1",
        "16"
      ],
      "contents": {
        "intContents": [
          2,
          4,
          6,
          8,
          10,
          12,
          14,
          16,
          18,
          20,
          22,
          24,
          26,
          28,
          30,
          32
        ]
      }
    }
  ]
}

kubectl delete -f ./pipelines/tfsimples.yaml -n ${NAMESPACE}

pipeline.mlops.seldon.io "tfsimples" deleted

kubectl delete -f ./models/tfsimple1.yaml -n ${NAMESPACE}
kubectl delete -f ./models/tfsimple2.yaml -n ${NAMESPACE}

model.mlops.seldon.io "tfsimple1" deleted
model.mlops.seldon.io "tfsimple2" deleted

Pipeline - model join

cat ./models/tfsimple1.yaml
cat ./models/tfsimple2.yaml
cat ./models/tfsimple3.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: tfsimple1
spec:
  storageUri: "gs://seldon-models/triton/simple"
  requirements:
  - tensorflow
  memory: 100Ki
apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: tfsimple2
spec:
  storageUri: "gs://seldon-models/triton/simple"
  requirements:
  - tensorflow
  memory: 100Ki
apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: tfsimple3
spec:
  storageUri: "gs://seldon-models/triton/simple"
  requirements:
  - tensorflow
  memory: 100Ki

kubectl create -f ./models/tfsimple1.yaml -n ${NAMESPACE}
kubectl create -f ./models/tfsimple2.yaml -n ${NAMESPACE}
kubectl create -f ./models/tfsimple3.yaml -n ${NAMESPACE}

model.mlops.seldon.io/tfsimple1 created
model.mlops.seldon.io/tfsimple2 created
model.mlops.seldon.io/tfsimple3 created

kubectl wait --for condition=ready --timeout=300s model --all -n ${NAMESPACE}

model.mlops.seldon.io/tfsimple1 condition met
model.mlops.seldon.io/tfsimple2 condition met
model.mlops.seldon.io/tfsimple3 condition met

cat ./pipelines/tfsimples-join.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Pipeline
metadata:
  name: join
spec:
  steps:
    - name: tfsimple1
    - name: tfsimple2
    - name: tfsimple3
      inputs:
      - tfsimple1.outputs.OUTPUT0
      - tfsimple2.outputs.OUTPUT1
      tensorMap:
        tfsimple1.outputs.OUTPUT0: INPUT0
        tfsimple2.outputs.OUTPUT1: INPUT1
  output:
    steps:
    - tfsimple3

kubectl create -f ./pipelines/tfsimples-join.yaml -n ${NAMESPACE}

pipeline.mlops.seldon.io/join created

kubectl wait --for condition=ready --timeout=300s pipeline --all -n ${NAMESPACE}

pipeline.mlops.seldon.io/join condition met

seldon pipeline infer join --inference-mode grpc --inference-host ${MESH_IP}:80 \
    '{"model_name":"simple","inputs":[{"name":"INPUT0","contents":{"int_contents":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]},"datatype":"INT32","shape":[1,16]},{"name":"INPUT1","contents":{"int_contents":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]},"datatype":"INT32","shape":[1,16]}]}' | jq -M .

{
  "outputs": [
    {
      "name": "OUTPUT0",
      "datatype": "INT32",
      "shape": [
        "1",
        "16"
      ],
      "contents": {
        "intContents": [
          2,
          4,
          6,
          8,
          10,
          12,
          14,
          16,
          18,
          20,
          22,
          24,
          26,
          28,
          30,
          32
        ]
      }
    },
    {
      "name": "OUTPUT1",
      "datatype": "INT32",
      "shape": [
        "1",
        "16"
      ],
      "contents": {
        "intContents": [
          2,
          4,
          6,
          8,
          10,
          12,
          14,
          16,
          18,
          20,
          22,
          24,
          26,
          28,
          30,
          32
        ]
      }
    }
  ]
}

kubectl delete -f ./pipelines/tfsimples-join.yaml -n ${NAMESPACE}

pipeline.mlops.seldon.io "join" deleted

kubectl delete -f ./models/tfsimple1.yaml -n ${NAMESPACE}
kubectl delete -f ./models/tfsimple2.yaml -n ${NAMESPACE}
kubectl delete -f ./models/tfsimple3.yaml -n ${NAMESPACE}

model.mlops.seldon.io "tfsimple1" deleted
model.mlops.seldon.io "tfsimple2" deleted
model.mlops.seldon.io "tfsimple3" deleted

Explainer

cat ./models/income.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: income
spec:
  storageUri: "gs://seldon-models/scv2/examples/mlserver_1.3.5/income/classifier"
  requirements:
  - sklearn

kubectl create -f ./models/income.yaml -n ${NAMESPACE}

model.mlops.seldon.io/income created

kubectl wait --for condition=ready --timeout=300s model --all -n ${NAMESPACE}

model.mlops.seldon.io/income condition met

kubectl get model income -n ${NAMESPACE} -o jsonpath='{.status}' | jq -M .

{
  "conditions": [
    {
      "lastTransitionTime": "2023-06-30T10:02:53Z",
      "message": "ModelAvailable",
      "status": "True",
      "type": "ModelReady"
    },
    {
      "lastTransitionTime": "2023-06-30T10:02:53Z",
      "status": "True",
      "type": "Ready"
    }
  ],
  "replicas": 1
}

seldon model infer income --inference-host ${MESH_IP}:80 \
     '{"inputs": [{"name": "predict", "shape": [1, 12], "datatype": "FP32", "data": [[47,4,1,1,1,3,4,1,0,0,40,9]]}]}'

{
	"model_name": "income_1",
	"model_version": "1",
	"id": "f52acfeb-0f22-429f-8c7a-785ef17cd470",
	"parameters": {},
	"outputs": [
		{
			"name": "predict",
			"shape": [
				1,
				1
			],
			"datatype": "INT64",
			"parameters": {
				"content_type": "np"
			},
			"data": [
				0
			]
		}
	]
}

cat ./models/income-explainer.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: income-explainer
spec:
  storageUri: "gs://seldon-models/scv2/examples/mlserver_1.3.5/income/explainer"
  explainer:
    type: anchor_tabular
    modelRef: income

kubectl create -f ./models/income-explainer.yaml -n ${NAMESPACE}

model.mlops.seldon.io/income-explainer created

kubectl wait --for condition=ready --timeout=300s model --all -n ${NAMESPACE}

model.mlops.seldon.io/income condition met
model.mlops.seldon.io/income-explainer condition met

kubectl get model income-explainer -n ${NAMESPACE} -o jsonpath='{.status}' | jq -M .

{
  "conditions": [
    {
      "lastTransitionTime": "2023-06-30T10:03:07Z",
      "message": "ModelAvailable",
      "status": "True",
      "type": "ModelReady"
    },
    {
      "lastTransitionTime": "2023-06-30T10:03:07Z",
      "status": "True",
      "type": "Ready"
    }
  ],
  "replicas": 1
}

seldon model infer income-explainer --inference-host ${MESH_IP}:80 \
     '{"inputs": [{"name": "predict", "shape": [1, 12], "datatype": "FP32", "data": [[47,4,1,1,1,3,4,1,0,0,40,9]]}]}'

{
	"model_name": "income-explainer_1",
	"model_version": "1",
	"id": "3028a904-9bb3-42d7-bdb7-6e6993323ed7",
	"parameters": {},
	"outputs": [
		{
			"name": "explanation",
			"shape": [
				1,
				1
			],
			"datatype": "BYTES",
			"parameters": {
				"content_type": "str"
			},
			"data": [
				"{\"meta\": {\"name\": \"AnchorTabular\", \"type\": [\"blackbox\"], \"explanations\": [\"local\"], \"params\": {\"seed\": 1, \"disc_perc\": [25, 50, 75], \"threshold\": 0.95, \"delta\": 0.1, \"tau\": 0.15, \"batch_size\": 100, \"coverage_samples\": 10000, \"beam_size\": 1, \"stop_on_first\": false, \"max_anchor_size\": null, \"min_samples_start\": 100, \"n_covered_ex\": 10, \"binary_cache_size\": 10000, \"cache_margin\": 1000, \"verbose\": false, \"verbose_every\": 1, \"kwargs\": {}}, \"version\": \"0.9.1\"}, \"data\": {\"anchor\": [\"Marital Status = Never-Married\", \"Relationship = Own-child\"], \"precision\": 0.9705882352941176, \"coverage\": 0.0699, \"raw\": {\"feature\": [3, 5], \"mean\": [0.8094218415417559, 0.9705882352941176], \"precision\": [0.8094218415417559, 0.9705882352941176], \"coverage\": [0.3036, 0.0699], \"examples\": [{\"covered_true\": [[23, 4, 1, 1, 5, 1, 4, 0, 0, 0, 40, 9], [44, 4, 1, 1, 8, 0, 4, 1, 0, 0, 40, 9], [60, 2, 5, 1, 5, 1, 4, 0, 0, 0, 25, 9], [52, 4, 1, 1, 2, 0, 4, 1, 0, 0, 50, 9], [66, 6, 1, 1, 8, 0, 4, 1, 0, 0, 8, 9], [52, 4, 1, 1, 8, 0, 4, 1, 0, 0, 40, 9], [27, 4, 1, 1, 1, 1, 4, 1, 0, 0, 35, 9], [48, 4, 1, 1, 6, 0, 4, 1, 0, 0, 45, 9], [45, 6, 1, 1, 5, 0, 4, 1, 0, 0, 40, 9], [40, 2, 1, 1, 5, 4, 4, 0, 0, 0, 45, 9]], \"covered_false\": [[42, 6, 5, 1, 6, 0, 4, 1, 99999, 0, 80, 9], [29, 4, 1, 1, 8, 1, 4, 1, 0, 0, 50, 9], [49, 4, 1, 1, 8, 0, 4, 1, 0, 0, 50, 9], [34, 4, 5, 1, 8, 0, 4, 1, 0, 0, 40, 9], [38, 2, 1, 1, 5, 5, 4, 0, 7688, 0, 40, 9], [45, 7, 5, 1, 5, 0, 4, 1, 0, 0, 45, 9], [43, 4, 2, 1, 5, 0, 4, 1, 99999, 0, 55, 9], [47, 4, 5, 1, 6, 1, 4, 1, 27828, 0, 60, 9], [42, 6, 1, 1, 2, 0, 4, 1, 15024, 0, 60, 9], [56, 4, 1, 1, 6, 0, 2, 1, 7688, 0, 45, 9]], \"uncovered_true\": [], \"uncovered_false\": []}, {\"covered_true\": [[23, 4, 1, 1, 4, 3, 4, 1, 0, 0, 40, 9], [50, 2, 5, 1, 8, 3, 2, 1, 0, 0, 45, 9], [24, 4, 1, 1, 7, 3, 4, 0, 0, 0, 40, 3], [62, 4, 5, 1, 5, 3, 4, 1, 0, 0, 40, 9], [22, 4, 1, 1, 5, 3, 4, 1, 0, 0, 40, 9], [44, 4, 1, 1, 1, 3, 4, 0, 0, 0, 40, 9], [46, 4, 1, 1, 4, 3, 4, 1, 0, 0, 40, 9], [44, 4, 1, 1, 2, 3, 4, 1, 0, 0, 40, 9], [25, 4, 5, 1, 5, 3, 4, 1, 0, 0, 35, 9], [32, 2, 5, 1, 5, 3, 4, 1, 0, 0, 50, 9]], \"covered_false\": [[57, 5, 5, 1, 6, 3, 4, 1, 99999, 0, 40, 9], [44, 4, 1, 1, 8, 3, 4, 1, 7688, 0, 60, 9], [43, 2, 5, 1, 4, 3, 2, 0, 8614, 0, 47, 9], [56, 5, 2, 1, 5, 3, 4, 1, 99999, 0, 70, 9]], \"uncovered_true\": [], \"uncovered_false\": []}], \"all_precision\": 0, \"num_preds\": 1000000, \"success\": true, \"names\": [\"Marital Status = Never-Married\", \"Relationship = Own-child\"], \"prediction\": [0], \"instance\": [47.0, 4.0, 1.0, 1.0, 1.0, 3.0, 4.0, 1.0, 0.0, 0.0, 40.0, 9.0], \"instances\": [[47.0, 4.0, 1.0, 1.0, 1.0, 3.0, 4.0, 1.0, 0.0, 0.0, 40.0, 9.0]]}}}"
			]
		}
	]
}

kubectl delete -f ./models/income.yaml -n ${NAMESPACE}
kubectl delete -f ./models/income-explainer.yaml -n ${NAMESPACE}

model.mlops.seldon.io "income" deleted
model.mlops.seldon.io "income-explainer" deleted

Huggingface models

Text Generation Model

cat ./models/hf-text-gen.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: text-gen
spec:
  storageUri: "gs://seldon-models/mlserver/huggingface/text-generation"
  requirements:
  - huggingface

Load the model

seldon model load -f ./models/hf-text-gen.yaml

{}

seldon model status text-gen -w ModelAvailable | jq -M .

{}

seldon model infer text-gen \
  '{"inputs": [{"name": "args","shape": [1],"datatype": "BYTES","data": ["Once upon a time in a galaxy far away"]}]}'

{
	"model_name": "text-gen_1",
	"model_version": "1",
	"id": "121ff5f4-1d4a-46d0-9a5e-4cd3b11040df",
	"parameters": {},
	"outputs": [
		{
			"name": "output",
			"shape": [
				1,
				1
			],
			"datatype": "BYTES",
			"parameters": {
				"content_type": "hg_jsonlist"
			},
			"data": [
				"{\"generated_text\": \"Once upon a time in a galaxy far away, the planet is full of strange little creatures. A very strange combination of creatures in that universe, that is. A strange combination of creatures in that universe, that is. A kind of creature that is\"}"
			]
		}
	]
}

res = !seldon model infer text-gen --inference-mode grpc \
   '{"inputs":[{"name":"args","contents":{"bytes_contents":["T25jZSB1cG9uIGEgdGltZSBpbiBhIGdhbGF4eSBmYXIgYXdheQo="]},"datatype":"BYTES","shape":[1]}]}'

import json
import base64
r = json.loads(res[0])
base64.b64decode(r["outputs"][0]["contents"]["bytesContents"][0])

b'{"generated_text": "Once upon a time in a galaxy far away\\n\\nThe Universe is a big and massive place. How can you feel any of this? Your body doesn\'t make sense if the Universe is in full swing \\u2014 you don\'t have to remember whether the"}'

Unload the model

seldon model unload text-gen

Custom Text Generation Model

cat ./models/hf-text-gen-custom-tiny-stories.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: custom-tiny-stories-text-gen
spec:
  storageUri: "gs://seldon-models/scv2/samples/mlserver_1.3.5/huggingface-text-gen-custom-tiny-stories"
  requirements:
    - huggingface

Load the model

seldon model load -f ./models/hf-text-gen-custom-tiny-stories.yaml

{}

seldon model status custom-tiny-stories-text-gen -w ModelAvailable | jq -M .

{}

seldon model infer custom-tiny-stories-text-gen \
  '{"inputs": [{"name": "args","shape": [1],"datatype": "BYTES","data": ["Once upon a time in a galaxy far away"]}]}'

{
	"model_name": "custom-tiny-stories-text-gen_1",
	"model_version": "1",
	"id": "d0fce59c-76e2-4f81-9711-1c93d08bcbf9",
	"parameters": {},
	"outputs": [
		{
			"name": "output",
			"shape": [
				1,
				1
			],
			"datatype": "BYTES",
			"parameters": {
				"content_type": "hg_jsonlist"
			},
			"data": [
				"{\"generated_text\": \"Once upon a time in a galaxy far away. It was a very special place to live.\\n\"}"
			]
		}
	]
}

res = !seldon model infer custom-tiny-stories-text-gen --inference-mode grpc \
   '{"inputs":[{"name":"args","contents":{"bytes_contents":["T25jZSB1cG9uIGEgdGltZSBpbiBhIGdhbGF4eSBmYXIgYXdheQo="]},"datatype":"BYTES","shape":[1]}]}'

import json
import base64
r = json.loads(res[0])
base64.b64decode(r["outputs"][0]["contents"]["bytesContents"][0])

b'{"generated_text": "Once upon a time in a galaxy far away\\nOne night, a little girl named Lily went to"}'

Unload the model

seldon model unload custom-tiny-stories-text-gen

As a next step, why not try running a larger-scale model? You can find a definition for one in ./models/hf-text-gen-custom-gpt2.yaml. However, you may need to request and allocate more memory!

Model zoo

Examples of various model artifact types from various frameworks running under Seldon Core 2.

SKlearn
Tensorflow
XGBoost
ONNX
Lightgbm
MLFlow
PyTorch

Python requirements in model-zoo-requirements.txt

SKLearn Iris Classification Model

The training code for this model can be found at scripts/models/iris in SCv2 repo.

cat ./models/sklearn-iris-gs.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: iris
spec:
  storageUri: "gs://seldon-models/scv2/samples/mlserver_1.3.5/iris-sklearn"
  requirements:
  - sklearn
  memory: 100Ki

seldon model load -f ./models/sklearn-iris-gs.yaml

{}

seldon model status iris -w ModelAvailable | jq -M .

{}

seldon model infer iris \
  '{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}]}'

Tensorflow CIFAR10 Image Classification Model

XGBoost Model

The training code for this model can be found at ./scripts/models/income-xgb

ONNX MNIST Model

This model is a pretrained model as defined in ./scripts/models/Makefile target mnist-onnx

LightGBM Model

The training code for this model can be found at ./scripts/models/income-lgb

MLFlow Wine Model

The training code for this model can be found at ./scripts/models/wine-mlflow

Pytorch MNIST Model

This example model is downloaded and trained in ./scripts/models/Makefile target mnist-pytorch

Artifact versions

Seldon V2 Kubernetes Multi Version Artifact Examples

We have a Triton model that has two version folders

Model 1 adds 10 to input, Model 2 multiples by 10 the input. The structure of the artifact repo is shown below:

config.pbtxt
1/model.py <add 10>
2/model.py <mul 10>

import os
os.environ["NAMESPACE"] = "seldon-mesh"

MESH_IP=!kubectl get svc seldon-mesh -n ${NAMESPACE} -o jsonpath='{.status.loadBalancer.ingress[0].ip}'
MESH_IP=MESH_IP[0]
import os
os.environ['MESH_IP'] = MESH_IP
MESH_IP

'172.19.255.1'

Model

cat ./models/multi-version-1.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: math
spec:
  storageUri: "gs://seldon-models/scv2/samples/triton_23-03/multi-version"
  artifactVersion: 1
  requirements:
  - triton
  - python

kubectl apply -f ./models/multi-version-1.yaml -n ${NAMESPACE}

model.mlops.seldon.io/math created

kubectl wait --for condition=ready --timeout=300s model --all -n ${NAMESPACE}

model.mlops.seldon.io/math condition met

seldon model infer math --inference-mode grpc --inference-host ${MESH_IP}:80 \
  '{"model_name":"math","inputs":[{"name":"INPUT","contents":{"fp32_contents":[1,2,3,4]},"datatype":"FP32","shape":[4]}]}' | jq -M .

{
  "modelName": "math_1",
  "modelVersion": "1",
  "outputs": [
    {
      "name": "OUTPUT",
      "datatype": "FP32",
      "shape": [
        "4"
      ],
      "contents": {
        "fp32Contents": [
          11,
          12,
          13,
          14
        ]
      }
    }
  ]
}

cat ./models/multi-version-2.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: math
spec:
  storageUri: "gs://seldon-models/scv2/samples/triton_23-03/multi-version"
  artifactVersion: 2
  requirements:
  - triton
  - python

kubectl apply -f ./models/multi-version-2.yaml -n ${NAMESPACE}

model.mlops.seldon.io/math configured

kubectl wait --for condition=ready --timeout=300s model --all -n ${NAMESPACE}

model.mlops.seldon.io/math condition met

seldon model infer math --inference-mode grpc --inference-host ${MESH_IP}:80 \
  '{"model_name":"math","inputs":[{"name":"INPUT","contents":{"fp32_contents":[1,2,3,4]},"datatype":"FP32","shape":[4]}]}' | jq -M .

{
  "modelName": "math_2",
  "modelVersion": "1",
  "outputs": [
    {
      "name": "OUTPUT",
      "datatype": "FP32",
      "shape": [
        "4"
      ],
      "contents": {
        "fp32Contents": [
          10,
          20,
          30,
          40
        ]
      }
    }
  ]
}

kubectl delete -f ./models/multi-version-1.yaml -n ${NAMESPACE}

model.mlops.seldon.io "math" deleted

Pipeline examples

This notebook illustrates a series of Pipelines showing of different ways of combining flows of data and conditional logic. We assume you have Seldon Core 2 running locally.

Models Used

Other models can be found at https://github.com/SeldonIO/triton-python-examples

Model Chaining

Chain the output of one model into the next. Also shows chaning the tensor names via tensorMap to conform to the expected input tensor names of the second model.

The pipeline below chains the output of tfsimple1 into tfsimple2. As these models have compatible shape and data type this can be done. However, the output tensor names from tfsimple1 need to be renamed to match the input tensor names for tfsimple2. We do this with the tensorMap feature.

The output of the Pipeline is the output from tfsimple2.

We use the Seldon CLI pipeline inspect feature to look at the data for all steps of the pipeline for the last data item passed through the pipeline (the default). This can be useful for debugging.

Next, we look get the output as json and use the jq tool to get just one value.

Model Chaining from inputs

Chain the output of one model into the next. Shows using the input and outputs and combining.

Model Join

Join two flows of data from two models as input to a third model. This shows how individual flows of data can be combined.

In the pipeline below for the input to tfsimple3 we join 1 output tensor each from the two previous models tfsimple1 and tfsimple2. We need to use the tensorMap feature to rename each output tensor to one of the expected input tensors for the tfsimple3 model.

The outputs are the sequence "2,4,6..." which conforms to the logic of this model (addition and subtraction) when fed the output of the first two models.

Conditional

Shows conditional data flows - one of two models is run based on output tensors from first.

Here we assume the conditional model can output two tensors OUTPUT0 and OUTPUT1 but only outputs the former if the CHOICE input tensor is set to 0 otherwise it outputs tensor OUTPUT1. By this means only one of the two downstream models will receive data and run. The output steps does an any join from both models and whichever data appears first will be sent as output to pipeline. As in this case only 1 of the two models add10 and mul10 runs we will receive their output.

The mul10 model will run as the CHOICE tensor is set to 0.

The add10 model will run as the CHOICE tensor is not set to zero.

Pipeline Input Tensors

Access to indivudal tensors in pipeline inputs

This pipeline shows how we can access pipeline inputs INPUT0 and INPUT1 from different steps.

Trigger Joins

Shows how joins can be used for triggers as well.

Here we required tensors names ok1 or ok2 to exist on pipeline inputs to run the mul10 model but require tensor ok3 to exist on pipeline inputs to run the add10 model. The logic on mul10 is handled by a trigger join of any meaning either of these input data can exist to satisfy the trigger join.

Explainer examples

cat ./models/income.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: income
spec:
  storageUri: "gs://seldon-models/scv2/examples/mlserver_1.3.5/income/classifier"
  requirements:
  - sklearn

seldon model load -f ./models/income.yaml

{}

seldon model status income -w ModelAvailable

{}

seldon model infer income \
  '{"inputs": [{"name": "predict", "shape": [1, 12], "datatype": "FP32", "data": [[47,4,1,1,1,3,4,1,0,0,40,9]]}]}'

{
	"model_name": "income_1",
	"model_version": "1",
	"id": "c65b8302-85af-4bac-aac5-91e3bedebee8",
	"parameters": {},
	"outputs": [
		{
			"name": "predict",
			"shape": [
				1,
				1
			],
			"datatype": "INT64",
			"data": [
				0
			]
		}
	]
}

cat ./models/income-explainer.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: income-explainer
spec:
  storageUri: "gs://seldon-models/scv2/examples/mlserver_1.3.5/income/explainer"
  explainer:
    type: anchor_tabular
    modelRef: income

seldon model load -f ./models/income-explainer.yaml

{}

seldon model status income-explainer -w ModelAvailable

{}

seldon model infer income-explainer \
  '{"inputs": [{"name": "predict", "shape": [1, 12], "datatype": "FP32", "data": [[47,4,1,1,1,3,4,1,0,0,40,9]]}]}'

{
	"model_name": "income-explainer_1",
	"model_version": "1",
	"id": "a22c3785-ff3b-4504-9b3c-199aa48a62d6",
	"parameters": {},
	"outputs": [
		{
			"name": "explanation",
			"shape": [
				1,
				1
			],
			"datatype": "BYTES",
			"parameters": {
				"content_type": "str"
			},
			"data": [
				"{\"meta\": {\"name\": \"AnchorTabular\", \"type\": [\"blackbox\"], \"explanations\": [\"local\"], \"params\": {\"seed\": 1, \"disc_perc\": [25, 50, 75], \"threshold\": 0.95, \"delta\": 0.1, \"tau\": 0.15, \"batch_size\": 100, \"coverage_samples\": 10000, \"beam_size\": 1, \"stop_on_first\": false, \"max_anchor_size\": null, \"min_samples_start\": 100, \"n_covered_ex\": 10, \"binary_cache_size\": 10000, \"cache_margin\": 1000, \"verbose\": false, \"verbose_every\": 1, \"kwargs\": {}}, \"version\": \"0.9.0\"}, \"data\": {\"anchor\": [\"Marital Status = Never-Married\", \"Relationship = Own-child\"], \"precision\": 0.9518716577540107, \"coverage\": 0.07165109034267912, \"raw\": {\"feature\": [3, 5], \"mean\": [0.7959381044487428, 0.9518716577540107], \"precision\": [0.7959381044487428, 0.9518716577540107], \"coverage\": [0.3037383177570093, 0.07165109034267912], \"examples\": [{\"covered_true\": [[52, 5, 5, 1, 8, 1, 2, 0, 0, 0, 50, 9], [49, 4, 1, 1, 4, 4, 1, 0, 0, 0, 40, 1], [23, 4, 1, 1, 6, 1, 4, 1, 0, 0, 40, 9], [55, 2, 1, 1, 5, 1, 4, 0, 0, 0, 48, 9], [22, 4, 1, 1, 2, 3, 4, 0, 0, 0, 15, 9], [51, 4, 2, 1, 5, 0, 1, 1, 0, 0, 99, 4], [40, 4, 1, 1, 5, 1, 4, 0, 0, 0, 40, 9], [40, 6, 1, 1, 2, 0, 4, 1, 0, 0, 50, 9], [50, 5, 5, 1, 6, 0, 4, 1, 0, 0, 55, 9], [41, 4, 1, 1, 6, 0, 4, 1, 0, 0, 40, 9]], \"covered_false\": [[42, 4, 1, 1, 8, 0, 4, 1, 0, 2415, 60, 9], [48, 6, 2, 1, 5, 4, 4, 0, 0, 0, 60, 9], [37, 4, 1, 1, 5, 0, 4, 1, 0, 0, 45, 9], [57, 4, 5, 1, 8, 0, 4, 1, 0, 0, 50, 9], [63, 7, 2, 1, 8, 0, 4, 1, 0, 1902, 50, 9], [51, 4, 5, 1, 8, 0, 4, 1, 0, 1887, 47, 9], [51, 2, 2, 1, 8, 1, 4, 0, 0, 0, 45, 9], [68, 7, 5, 1, 5, 0, 4, 1, 0, 2377, 42, 0], [45, 4, 1, 1, 8, 0, 4, 1, 15024, 0, 40, 9], [45, 4, 1, 1, 8, 0, 4, 1, 0, 1977, 60, 9]], \"uncovered_true\": [], \"uncovered_false\": []}, {\"covered_true\": [[44, 6, 5, 1, 8, 3, 4, 0, 0, 1902, 60, 9], [58, 7, 2, 1, 5, 3, 1, 1, 4064, 0, 40, 1], [50, 7, 1, 1, 1, 3, 2, 0, 0, 0, 37, 9], [34, 4, 2, 1, 5, 3, 4, 1, 0, 0, 45, 9], [45, 4, 1, 1, 5, 3, 4, 1, 0, 0, 40, 9], [33, 7, 5, 1, 5, 3, 1, 1, 0, 0, 30, 6], [61, 7, 2, 1, 5, 3, 4, 1, 0, 0, 40, 0], [35, 4, 5, 1, 1, 3, 4, 1, 0, 0, 40, 9], [71, 2, 1, 1, 5, 3, 4, 0, 0, 0, 6, 9], [44, 4, 1, 1, 8, 3, 2, 1, 0, 0, 35, 9]], \"covered_false\": [[30, 4, 5, 1, 5, 3, 4, 1, 10520, 0, 40, 9], [54, 7, 2, 1, 8, 3, 4, 1, 0, 1902, 50, 9], [66, 6, 2, 1, 6, 3, 4, 1, 0, 2377, 25, 9], [35, 4, 2, 1, 5, 3, 4, 1, 7298, 0, 40, 9], [44, 4, 1, 1, 8, 3, 4, 1, 7298, 0, 48, 9], [31, 4, 1, 1, 8, 3, 4, 0, 13550, 0, 50, 9], [35, 4, 1, 1, 8, 3, 4, 1, 8614, 0, 45, 9]], \"uncovered_true\": [], \"uncovered_false\": []}], \"all_precision\": 0, \"num_preds\": 1000000, \"success\": true, \"names\": [\"Marital Status = Never-Married\", \"Relationship = Own-child\"], \"prediction\": [0], \"instance\": [47.0, 4.0, 1.0, 1.0, 1.0, 3.0, 4.0, 1.0, 0.0, 0.0, 40.0, 9.0], \"instances\": [[47.0, 4.0, 1.0, 1.0, 1.0, 3.0, 4.0, 1.0, 0.0, 0.0, 40.0, 9.0]]}}}"
			]
		}
	]
}

seldon model unload income-explainer

{}

seldon model unload income

{}

Anchor Text Explainer for SKLearn Movies Sentiment Model

cat ./models/moviesentiment.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: sentiment
spec:
  storageUri: "gs://seldon-models/scv2/samples/mlserver_1.3.5/moviesentiment-sklearn"
  requirements:
  - sklearn

seldon model load -f ./models/moviesentiment.yaml

{}

seldon model status sentiment -w ModelAvailable

{}

seldon model infer sentiment \
  '{"parameters": {"content_type": "str"}, "inputs": [{"name": "foo", "data": ["I am good"], "datatype": "BYTES","shape": [1]}]}'

{
	"model_name": "sentiment_2",
	"model_version": "1",
	"id": "f5c07363-7e9d-4f09-aa30-228c81fdf4a4",
	"parameters": {},
	"outputs": [
		{
			"name": "predict",
			"shape": [
				1,
				1
			],
			"datatype": "INT64",
			"parameters": {
				"content_type": "np"
			},
			"data": [
				0
			]
		}
	]
}

cat ./models/moviesentiment-explainer.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: sentiment-explainer
spec:
  storageUri: "gs://seldon-models/scv2/samples/mlserver_1.3.5/moviesentiment-sklearn-explainer"
  explainer:
    type: anchor_text
    modelRef: sentiment

seldon model load -f ./models/moviesentiment-explainer.yaml

{}

seldon model status sentiment-explainer -w ModelAvailable

{}

seldon model infer sentiment-explainer \
  '{"parameters": {"content_type": "str"}, "inputs": [{"name": "foo", "data": ["I am good"], "datatype": "BYTES","shape": [1]}]}'

Error: V2 server error: 500 Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/site-packages/starlette/middleware/errors.py", line 162, in __call__
    await self.app(scope, receive, _send)
  File "/opt/conda/lib/python3.8/site-packages/starlette_exporter/middleware.py", line 307, in __call__
    await self.app(scope, receive, wrapped_send)
  File "/opt/conda/lib/python3.8/site-packages/starlette/middleware/gzip.py", line 24, in __call__
    await responder(scope, receive, send)
  File "/opt/conda/lib/python3.8/site-packages/starlette/middleware/gzip.py", line 44, in __call__
    await self.app(scope, receive, self.send_with_gzip)
  File "/opt/conda/lib/python3.8/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
    raise exc
  File "/opt/conda/lib/python3.8/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
  File "/opt/conda/lib/python3.8/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
    raise e
  File "/opt/conda/lib/python3.8/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
    await self.app(scope, receive, send)
  File "/opt/conda/lib/python3.8/site-packages/starlette/routing.py", line 706, in __call__
    await route.handle(scope, receive, send)
  File "/opt/conda/lib/python3.8/site-packages/starlette/routing.py", line 276, in handle
    await self.app(scope, receive, send)
  File "/opt/conda/lib/python3.8/site-packages/starlette/routing.py", line 66, in app
    response = await func(request)
  File "/opt/conda/lib/python3.8/site-packages/mlserver/rest/app.py", line 42, in custom_route_handler
    return await original_route_handler(request)
  File "/opt/conda/lib/python3.8/site-packages/fastapi/routing.py", line 237, in app
    raw_response = await run_endpoint_function(
  File "/opt/conda/lib/python3.8/site-packages/fastapi/routing.py", line 163, in run_endpoint_function
    return await dependant.call(**values)
  File "/opt/conda/lib/python3.8/site-packages/mlserver/rest/endpoints.py", line 99, in infer
    inference_response = await self._data_plane.infer(
  File "/opt/conda/lib/python3.8/site-packages/mlserver/handlers/dataplane.py", line 103, in infer
    prediction = await model.predict(payload)
  File "/opt/conda/lib/python3.8/site-packages/mlserver_alibi_explain/runtime.py", line 86, in predict
    output_data = await self._async_explain_impl(input_data, payload.parameters)
  File "/opt/conda/lib/python3.8/site-packages/mlserver_alibi_explain/runtime.py", line 119, in _async_explain_impl
    explanation = await loop.run_in_executor(self._executor, explain_call)
  File "/opt/conda/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/opt/conda/lib/python3.8/site-packages/mlserver_alibi_explain/explainers/black_box_runtime.py", line 62, in _explain_impl
    input_data = input_data[0]
KeyError: 0

seldon model unload sentiment-explainer

{}

seldon model unload sentiment

{}

Custom Servers

import os
os.environ["NAMESPACE"] = "seldon-mesh"

MESH_IP=!kubectl get svc seldon-mesh -n ${NAMESPACE} -o jsonpath='{.status.loadBalancer.ingress[0].ip}'
MESH_IP=MESH_IP[0]
import os
os.environ['MESH_IP'] = MESH_IP
MESH_IP

'172.18.255.2'

Custom Server with Capabilities

The capabilities field replaces the capabilities from the ServerConfig.

cat ./servers/custom-mlserver-capabilities.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Server
metadata:
  name: mlserver-134
spec:
  serverConfig: mlserver
  capabilities:
  - mlserver-1.3.4
  podSpec:
    containers:
    - image: seldonio/mlserver:1.3.4
      name: mlserver

kubectl create -f ./servers/custom-mlserver-capabilities.yaml -n ${NAMESPACE}

server.mlops.seldon.io/mlserver-134 created

kubectl wait --for condition=ready --timeout=300s server --all -n ${NAMESPACE}

server.mlops.seldon.io/mlserver condition met
server.mlops.seldon.io/mlserver-134 condition met
server.mlops.seldon.io/triton condition met

cat ./models/iris-custom-requirements.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: iris
spec:
  storageUri: "gs://seldon-models/mlserver/iris"
  requirements:
  - mlserver-1.3.4

kubectl create -f ./models/iris-custom-requirements.yaml -n ${NAMESPACE}

model.mlops.seldon.io/iris created

kubectl wait --for condition=ready --timeout=300s model --all -n ${NAMESPACE}

model.mlops.seldon.io/iris condition met

seldon model infer iris --inference-host ${MESH_IP}:80 \
  '{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}]}'

{
	"model_name": "iris_1",
	"model_version": "1",
	"id": "057ae95c-e6bc-4f57-babf-0817ff171729",
	"parameters": {},
	"outputs": [
		{
			"name": "predict",
			"shape": [
				1,
				1
			],
			"datatype": "INT64",
			"parameters": {
				"content_type": "np"
			},
			"data": [
				2
			]
		}
	]
}

kubectl delete -f ./models/iris-custom-server.yaml -n ${NAMESPACE}

model.mlops.seldon.io "iris" deleted

kubectl delete -f ./servers/custom-mlserver.yaml -n ${NAMESPACE}

server.mlops.seldon.io "mlserver-134" deleted

Custom Server with Extra Capabilities

The extraCapabilities field extends the existing list from the ServerConfig.

cat ./servers/custom-mlserver.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Server
metadata:
  name: mlserver-134
spec:
  serverConfig: mlserver
  extraCapabilities:
  - mlserver-1.3.4
  podSpec:
    containers:
    - image: seldonio/mlserver:1.3.4
      name: mlserver

kubectl create -f ./servers/custom-mlserver.yaml -n ${NAMESPACE}

server.mlops.seldon.io/mlserver-134 created

kubectl wait --for condition=ready --timeout=300s server --all -n ${NAMESPACE}

server.mlops.seldon.io/mlserver condition met
server.mlops.seldon.io/mlserver-134 condition met
server.mlops.seldon.io/triton condition met

cat ./models/iris-custom-server.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: iris
spec:
  storageUri: "gs://seldon-models/mlserver/iris"
  server: mlserver-134

kubectl create -f ./models/iris-custom-server.yaml -n ${NAMESPACE}

model.mlops.seldon.io/iris created

kubectl wait --for condition=ready --timeout=300s model --all -n ${NAMESPACE}

model.mlops.seldon.io/iris condition met

seldon model infer iris --inference-host ${MESH_IP}:80 \
  '{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}]}'

{
	"model_name": "iris_1",
	"model_version": "1",
	"id": "a3e17c6c-ee3f-4a51-b890-6fb16385a757",
	"parameters": {},
	"outputs": [
		{
			"name": "predict",
			"shape": [
				1,
				1
			],
			"datatype": "INT64",
			"parameters": {
				"content_type": "np"
			},
			"data": [
				2
			]
		}
	]
}

kubectl delete -f ./models/iris-custom-server.yaml -n ${NAMESPACE}

model.mlops.seldon.io "iris" deleted

kubectl delete -f ./servers/custom-mlserver.yaml -n ${NAMESPACE}

server.mlops.seldon.io "mlserver-134" deleted

Local experiments

Model Experiment

We will use two SKlearn Iris classification models to illustrate experiments.

cat ./models/sklearn1.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: iris
spec:
  storageUri: "gs://seldon-models/mlserver/iris"
  requirements:
  - sklearn

cat ./models/sklearn2.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: iris2
spec:
  storageUri: "gs://seldon-models/mlserver/iris"
  requirements:
  - sklearn

Load both models.

seldon model load -f ./models/sklearn1.yaml
seldon model load -f ./models/sklearn2.yaml

{}
{}

Wait for both models to be ready.

seldon model status iris -w ModelAvailable
seldon model status iris2 -w ModelAvailable

{}
{}

seldon model infer iris -i 50 \
  '{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}]}'

Success: map[:iris_1::50]

seldon model infer iris2 -i 50 \
  '{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}]}'

Success: map[:iris2_1::50]

Create an experiment that modifies the iris model to add a second model splitting traffic 50/50 between the two.

cat ./experiments/ab-default-model.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Experiment
metadata:
  name: experiment-sample
spec:
  default: iris
  candidates:
  - name: iris
    weight: 50
  - name: iris2
    weight: 50

Start the experiment.

seldon experiment start -f ./experiments/ab-default-model.yaml

Wait for the experiment to be ready.

seldon experiment status experiment-sample -w | jq -M .

{
  "experimentName": "experiment-sample",
  "active": true,
  "candidatesReady": true,
  "mirrorReady": true,
  "statusDescription": "experiment active",
  "kubernetesMeta": {}
}

Run a set of calls and record which route the traffic took. There should be roughly a 50/50 split.

seldon model infer iris -i 50 \
  '{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}]}'

Success: map[:iris2_1::27 :iris_1::23]

Show sticky session header x-seldon-route that is returned

seldon model infer iris --show-headers \
  '{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}]}'

> POST /v2/models/iris/infer HTTP/1.1
> Host: localhost:9000
> Content-Type:[application/json]
> Seldon-Model:[iris]

< X-Seldon-Route:[:iris_1:]
< Ce-Id:[463e96ad-645f-4442-8890-4c340b58820b]
< Traceparent:[00-fe9e87fcbe4be98ed82fb76166e15ceb-d35e7ac96bd8b718-01]
< X-Envoy-Upstream-Service-Time:[3]
< Ce-Specversion:[0.3]
< Date:[Thu, 29 Jun 2023 14:03:03 GMT]
< Ce-Source:[io.seldon.serving.deployment.mlserver]
< Content-Type:[application/json]
< Server:[envoy]
< X-Request-Id:[cieou5ofh5ss73fbjdu0]
< Ce-Endpoint:[iris_1]
< Ce-Modelid:[iris_1]
< Ce-Type:[io.seldon.serving.inference.response]
< Content-Length:[213]
< Ce-Inferenceservicename:[mlserver]
< Ce-Requestid:[463e96ad-645f-4442-8890-4c340b58820b]

{
	"model_name": "iris_1",
	"model_version": "1",
	"id": "463e96ad-645f-4442-8890-4c340b58820b",
	"parameters": {},
	"outputs": [
		{
			"name": "predict",
			"shape": [
				1,
				1
			],
			"datatype": "INT64",
			"parameters": {
				"content_type": "np"
			},
			"data": [
				2
			]
		}
	]
}

Use sticky session key passed by last infer request to ensure same route is taken each time.

seldon model infer iris -s -i 50 \
  '{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}]}'

Success: map[:iris_1::50]

seldon model infer iris --inference-mode grpc -s -i 50\
   '{"model_name":"iris","inputs":[{"name":"input","contents":{"fp32_contents":[1,2,3,4]},"datatype":"FP32","shape":[1,4]}]}'

Success: map[:iris_1::50]

Stop the experiment

seldon experiment stop experiment-sample

Unload both models.

seldon model unload iris
seldon model unload iris2

Pipeline Experiment

cat ./models/add10.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: add10
spec:
  storageUri: "gs://seldon-models/scv2/samples/triton_23-03/add10"
  requirements:
  - triton
  - python

cat ./models/mul10.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: mul10
spec:
  storageUri: "gs://seldon-models/scv2/samples/triton_23-03/mul10"
  requirements:
  - triton
  - python

seldon model load -f ./models/add10.yaml
seldon model load -f ./models/mul10.yaml

{}
{}

seldon model status add10 -w ModelAvailable
seldon model status mul10 -w ModelAvailable

{}
{}

cat ./pipelines/mul10.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Pipeline
metadata:
  name: pipeline-mul10
spec:
  steps:
    - name: mul10
  output:
    steps:
    - mul10

cat ./pipelines/add10.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Pipeline
metadata:
  name: pipeline-add10
spec:
  steps:
    - name: add10
  output:
    steps:
    - add10

seldon pipeline load -f ./pipelines/add10.yaml
seldon pipeline load -f ./pipelines/mul10.yaml

seldon pipeline status pipeline-add10 -w PipelineReady
seldon pipeline status pipeline-mul10 -w PipelineReady

{"pipelineName":"pipeline-add10", "versions":[{"pipeline":{"name":"pipeline-add10", "uid":"cieov47l80lc739juklg", "version":1, "steps":[{"name":"add10"}], "output":{"steps":["add10.outputs"]}, "kubernetesMeta":{}}, "state":{"pipelineVersion":1, "status":"PipelineReady", "reason":"created pipeline", "lastChangeTimestamp":"2023-06-29T14:05:04.460868091Z", "modelsReady":true}}]}
{"pipelineName":"pipeline-mul10", "versions":[{"pipeline":{"name":"pipeline-mul10", "uid":"cieov47l80lc739jukm0", "version":1, "steps":[{"name":"mul10"}], "output":{"steps":["mul10.outputs"]}, "kubernetesMeta":{}}, "state":{"pipelineVersion":1, "status":"PipelineReady", "reason":"created pipeline", "lastChangeTimestamp":"2023-06-29T14:05:04.631980330Z", "modelsReady":true}}]}

seldon pipeline infer pipeline-add10 --inference-mode grpc \
 '{"model_name":"add10","inputs":[{"name":"INPUT","contents":{"fp32_contents":[1,2,3,4]},"datatype":"FP32","shape":[4]}]}' | jq -M .

{
  "outputs": [
    {
      "name": "OUTPUT",
      "datatype": "FP32",
      "shape": [
        "4"
      ],
      "contents": {
        "fp32Contents": [
          11,
          12,
          13,
          14
        ]
      }
    }
  ]
}

seldon pipeline infer pipeline-mul10 --inference-mode grpc \
 '{"model_name":"add10","inputs":[{"name":"INPUT","contents":{"fp32_contents":[1,2,3,4]},"datatype":"FP32","shape":[4]}]}' | jq -M .

{
  "outputs": [
    {
      "name": "OUTPUT",
      "datatype": "FP32",
      "shape": [
        "4"
      ],
      "contents": {
        "fp32Contents": [
          10,
          20,
          30,
          40
        ]
      }
    }
  ]
}

cat ./experiments/addmul10.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Experiment
metadata:
  name: addmul10
spec:
  default: pipeline-add10
  resourceType: pipeline
  candidates:
  - name: pipeline-add10
    weight: 50
  - name: pipeline-mul10
    weight: 50

seldon experiment start -f ./experiments/addmul10.yaml

seldon experiment status addmul10 -w | jq -M .

{
  "experimentName": "addmul10",
  "active": true,
  "candidatesReady": true,
  "mirrorReady": true,
  "statusDescription": "experiment active",
  "kubernetesMeta": {}
}

seldon pipeline infer pipeline-add10 -i 50 --inference-mode grpc \
 '{"model_name":"add10","inputs":[{"name":"INPUT","contents":{"fp32_contents":[1,2,3,4]},"datatype":"FP32","shape":[4]}]}'

Success: map[:add10_1::28 :mul10_1::22 :pipeline-add10.pipeline::28 :pipeline-mul10.pipeline::22]

Use sticky session key passed by last infer request to ensure same route is taken each time.

seldon pipeline infer pipeline-add10 --show-headers --inference-mode grpc \
 '{"model_name":"add10","inputs":[{"name":"INPUT","contents":{"fp32_contents":[1,2,3,4]},"datatype":"FP32","shape":[4]}]}'

> /inference.GRPCInferenceService/ModelInfer HTTP/2
> Host: localhost:9000
> seldon-model:[pipeline-add10.pipeline]

< x-envoy-expected-rq-timeout-ms:[60000]
< x-request-id:[cieov8ofh5ss739277i0]
< date:[Thu, 29 Jun 2023 14:05:23 GMT]
< server:[envoy]
< content-type:[application/grpc]
< x-envoy-upstream-service-time:[6]
< x-seldon-route:[:add10_1: :pipeline-add10.pipeline:]
< x-forwarded-proto:[http]

{"outputs":[{"name":"OUTPUT", "datatype":"FP32", "shape":["4"], "contents":{"fp32Contents":[11, 12, 13, 14]}}]}

seldon pipeline infer pipeline-add10 -s --show-headers --inference-mode grpc \
 '{"model_name":"add10","inputs":[{"name":"INPUT","contents":{"fp32_contents":[1,2,3,4]},"datatype":"FP32","shape":[4]}]}'

> /inference.GRPCInferenceService/ModelInfer HTTP/2
> Host: localhost:9000
> x-seldon-route:[:add10_1: :pipeline-add10.pipeline:]
> seldon-model:[pipeline-add10.pipeline]

< content-type:[application/grpc]
< x-forwarded-proto:[http]
< x-envoy-expected-rq-timeout-ms:[60000]
< x-seldon-route:[:add10_1: :pipeline-add10.pipeline: :pipeline-add10.pipeline:]
< x-request-id:[cieov90fh5ss739277ig]
< x-envoy-upstream-service-time:[7]
< date:[Thu, 29 Jun 2023 14:05:24 GMT]
< server:[envoy]

{"outputs":[{"name":"OUTPUT", "datatype":"FP32", "shape":["4"], "contents":{"fp32Contents":[11, 12, 13, 14]}}]}

seldon pipeline infer pipeline-add10 -s -i 50 --inference-mode grpc \
 '{"model_name":"add10","inputs":[{"name":"INPUT","contents":{"fp32_contents":[1,2,3,4]},"datatype":"FP32","shape":[4]}]}'

Success: map[:add10_1::50 :pipeline-add10.pipeline::150]

cat ./models/add20.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: add20
spec:
  storageUri: "gs://seldon-models/triton/add20"
  requirements:
  - triton
  - python

seldon model load -f ./models/add20.yaml

{}

seldon model status add20 -w ModelAvailable

{}

cat ./experiments/add1020.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Experiment
metadata:
  name: add1020
spec:
  default: add10
  candidates:
  - name: add10
    weight: 50
  - name: add20
    weight: 50

seldon experiment start -f ./experiments/add1020.yaml

seldon experiment status add1020 -w | jq -M .

{
  "experimentName": "add1020",
  "active": true,
  "candidatesReady": true,
  "mirrorReady": true,
  "statusDescription": "experiment active",
  "kubernetesMeta": {}
}

seldon model infer add10 -i 50  --inference-mode grpc \
  '{"model_name":"add10","inputs":[{"name":"INPUT","contents":{"fp32_contents":[1,2,3,4]},"datatype":"FP32","shape":[4]}]}'

Success: map[:add10_1::22 :add20_1::28]

seldon pipeline infer pipeline-add10 -i 100 --inference-mode grpc \
 '{"model_name":"add10","inputs":[{"name":"INPUT","contents":{"fp32_contents":[1,2,3,4]},"datatype":"FP32","shape":[4]}]}'

Success: map[:add10_1::24 :add20_1::32 :mul10_1::44 :pipeline-add10.pipeline::56 :pipeline-mul10.pipeline::44]

seldon pipeline infer pipeline-add10 --show-headers --inference-mode grpc \
 '{"model_name":"add10","inputs":[{"name":"INPUT","contents":{"fp32_contents":[1,2,3,4]},"datatype":"FP32","shape":[4]}]}'

> /inference.GRPCInferenceService/ModelInfer HTTP/2
> Host: localhost:9000
> seldon-model:[pipeline-add10.pipeline]

< x-request-id:[cieovf0fh5ss739279u0]
< x-envoy-upstream-service-time:[5]
< x-seldon-route:[:add10_1: :pipeline-add10.pipeline:]
< date:[Thu, 29 Jun 2023 14:05:48 GMT]
< server:[envoy]
< content-type:[application/grpc]
< x-forwarded-proto:[http]
< x-envoy-expected-rq-timeout-ms:[60000]

{"outputs":[{"name":"OUTPUT", "datatype":"FP32", "shape":["4"], "contents":{"fp32Contents":[11, 12, 13, 14]}}]}

seldon pipeline infer pipeline-add10 -s --show-headers --inference-mode grpc \
 '{"model_name":"add10","inputs":[{"name":"INPUT","contents":{"fp32_contents":[1,2,3,4]},"datatype":"FP32","shape":[4]}]}'

> /inference.GRPCInferenceService/ModelInfer HTTP/2
> Host: localhost:9000
> x-seldon-route:[:add10_1: :pipeline-add10.pipeline:]
> seldon-model:[pipeline-add10.pipeline]

< x-forwarded-proto:[http]
< x-envoy-expected-rq-timeout-ms:[60000]
< x-request-id:[cieovf8fh5ss739279ug]
< x-envoy-upstream-service-time:[6]
< date:[Thu, 29 Jun 2023 14:05:49 GMT]
< server:[envoy]
< content-type:[application/grpc]
< x-seldon-route:[:add10_1: :pipeline-add10.pipeline: :add20_1: :pipeline-add10.pipeline:]

{"outputs":[{"name":"OUTPUT", "datatype":"FP32", "shape":["4"], "contents":{"fp32Contents":[21, 22, 23, 24]}}]}

seldon experiment stop addmul10
seldon experiment stop add1020
seldon pipeline unload pipeline-add10
seldon pipeline unload pipeline-mul10
seldon model unload add10
seldon model unload add20
seldon model unload mul10

Model Mirror Experiment

We will use two SKlearn Iris classification models to illustrate a model with a mirror.

cat ./models/sklearn1.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: iris
spec:
  storageUri: "gs://seldon-models/mlserver/iris"
  requirements:
  - sklearn

cat ./models/sklearn2.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: iris2
spec:
  storageUri: "gs://seldon-models/mlserver/iris"
  requirements:
  - sklearn

Load both models.

seldon model load -f ./models/sklearn1.yaml
seldon model load -f ./models/sklearn2.yaml

{}
{}

Wait for both models to be ready.

seldon model status iris -w ModelAvailable
seldon model status iris2 -w ModelAvailable

{}
{}

Create an experiment that modifies in which we mirror traffic to iris also to iris2.

cat ./experiments/sklearn-mirror.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Experiment
metadata:
  name: sklearn-mirror
spec:
  default: iris
  candidates:
  - name: iris
    weight: 100
  mirror:
    name: iris2
    percent: 100

Start the experiment.

seldon experiment start -f ./experiments/sklearn-mirror.yaml

Wait for the experiment to be ready.

seldon experiment status sklearn-mirror -w | jq -M .

{
  "experimentName": "sklearn-mirror",
  "active": true,
  "candidatesReady": true,
  "mirrorReady": true,
  "statusDescription": "experiment active",
  "kubernetesMeta": {}
}

We get responses from iris but all requests would also have been mirrored to iris2

seldon model infer iris -i 50 \
  '{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}]}'

Success: map[:iris_1::50]

We can check the local prometheus port from the agent to validate requests went to iris2

curl -s 0.0.0:9006/metrics | grep seldon_model_infer_total | grep iris2_1

seldon_model_infer_total{code="200",method_type="rest",model="iris",model_internal="iris2_1",server="mlserver",server_replica="0"} 50

Stop the experiment

seldon experiment stop sklearn-mirror

Unload both models.

seldon model unload iris
seldon model unload iris2

Pipeline Mirror Experiment

cat ./models/add10.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: add10
spec:
  storageUri: "gs://seldon-models/scv2/samples/triton_23-03/add10"
  requirements:
  - triton
  - python

cat ./models/mul10.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: mul10
spec:
  storageUri: "gs://seldon-models/scv2/samples/triton_23-03/mul10"
  requirements:
  - triton
  - python

seldon model load -f ./models/add10.yaml
seldon model load -f ./models/mul10.yaml

{}
{}

seldon model status add10 -w ModelAvailable
seldon model status mul10 -w ModelAvailable

{}
{}

cat ./pipelines/mul10.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Pipeline
metadata:
  name: pipeline-mul10
spec:
  steps:
    - name: mul10
  output:
    steps:
    - mul10

cat ./pipelines/add10.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Pipeline
metadata:
  name: pipeline-add10
spec:
  steps:
    - name: add10
  output:
    steps:
    - add10

seldon pipeline load -f ./pipelines/add10.yaml
seldon pipeline load -f ./pipelines/mul10.yaml

seldon pipeline status pipeline-add10 -w PipelineReady
seldon pipeline status pipeline-mul10 -w PipelineReady

{"pipelineName":"pipeline-add10", "versions":[{"pipeline":{"name":"pipeline-add10", "uid":"ciep072i8ufs73flaipg", "version":1, "steps":[{"name":"add10"}], "output":{"steps":["add10.outputs"]}, "kubernetesMeta":{}}, "state":{"pipelineVersion":1, "status":"PipelineReady", "reason":"created pipeline", "lastChangeTimestamp":"2023-06-29T14:07:24.903503109Z", "modelsReady":true}}]}
{"pipelineName":"pipeline-mul10", "versions":[{"pipeline":{"name":"pipeline-mul10", "uid":"ciep072i8ufs73flaiq0", "version":1, "steps":[{"name":"mul10"}], "output":{"steps":["mul10.outputs"]}, "kubernetesMeta":{}}, "state":{"pipelineVersion":1, "status":"PipelineReady", "reason":"created pipeline", "lastChangeTimestamp":"2023-06-29T14:07:25.082642153Z", "modelsReady":true}}]}

seldon pipeline infer pipeline-add10 --inference-mode grpc \
 '{"model_name":"add10","inputs":[{"name":"INPUT","contents":{"fp32_contents":[1,2,3,4]},"datatype":"FP32","shape":[4]}]}'

{"outputs":[{"name":"OUTPUT", "datatype":"FP32", "shape":["4"], "contents":{"fp32Contents":[11, 12, 13, 14]}}]}

seldon pipeline infer pipeline-mul10 --inference-mode grpc \
 '{"model_name":"add10","inputs":[{"name":"INPUT","contents":{"fp32_contents":[1,2,3,4]},"datatype":"FP32","shape":[4]}]}'

{"outputs":[{"name":"OUTPUT", "datatype":"FP32", "shape":["4"], "contents":{"fp32Contents":[10, 20, 30, 40]}}]}

cat ./experiments/addmul10-mirror.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Experiment
metadata:
  name: addmul10-mirror
spec:
  default: pipeline-add10
  resourceType: pipeline
  candidates:
  - name: pipeline-add10
    weight: 100
  mirror:
    name: pipeline-mul10
    percent: 100

seldon experiment start -f ./experiments/addmul10-mirror.yaml

seldon experiment status addmul10-mirror -w | jq -M .

{
  "experimentName": "addmul10-mirror",
  "active": true,
  "candidatesReady": true,
  "mirrorReady": true,
  "statusDescription": "experiment active",
  "kubernetesMeta": {}
}

seldon pipeline infer pipeline-add10 -i 1 --inference-mode grpc \
 '{"model_name":"add10","inputs":[{"name":"INPUT","contents":{"fp32_contents":[1,2,3,4]},"datatype":"FP32","shape":[4]}]}'

{"outputs":[{"name":"OUTPUT", "datatype":"FP32", "shape":["4"], "contents":{"fp32Contents":[11, 12, 13, 14]}}]}

Let's check that the mul10 model was called.

curl -s 0.0.0:9007/metrics | grep seldon_model_infer_total | grep mul10_1

seldon_model_infer_total{code="OK",method_type="grpc",model="mul10",model_internal="mul10_1",server="triton",server_replica="0"} 2

curl -s 0.0.0:9007/metrics | grep seldon_model_infer_total | grep add10_1

seldon_model_infer_total{code="OK",method_type="grpc",model="add10",model_internal="add10_1",server="triton",server_replica="0"} 2

Let's do an http call and check agaib the two models

seldon pipeline infer pipeline-add10 -i 1 \
 '{"model_name":"add10","inputs":[{"name":"INPUT","data":[1,2,3,4],"datatype":"FP32","shape":[4]}]}'

{
	"model_name": "",
	"outputs": [
		{
			"data": [
				11,
				12,
				13,
				14
			],
			"name": "OUTPUT",
			"shape": [
				4
			],
			"datatype": "FP32"
		}
	]
}

curl -s 0.0.0:9007/metrics | grep seldon_model_infer_total | grep mul10_1

seldon_model_infer_total{code="OK",method_type="grpc",model="mul10",model_internal="mul10_1",server="triton",server_replica="0"} 3

curl -s 0.0.0:9007/metrics | grep seldon_model_infer_total | grep add10_1

seldon_model_infer_total{code="OK",method_type="grpc",model="add10",model_internal="add10_1",server="triton",server_replica="0"} 3

seldon pipeline inspect pipeline-mul10

seldon.default.model.mul10.inputs	ciep0bofh5ss73dpdiq0	{"inputs":[{"name":"INPUT", "datatype":"FP32", "shape":["4"], "contents":{"fp32Contents":[1, 2, 3, 4]}}]}
seldon.default.model.mul10.outputs	ciep0bofh5ss73dpdiq0	{"modelName":"mul10_1", "modelVersion":"1", "outputs":[{"name":"OUTPUT", "datatype":"FP32", "shape":["4"], "contents":{"fp32Contents":[10, 20, 30, 40]}}]}
seldon.default.pipeline.pipeline-mul10.inputs	ciep0bofh5ss73dpdiq0	{"inputs":[{"name":"INPUT", "datatype":"FP32", "shape":["4"], "contents":{"fp32Contents":[1, 2, 3, 4]}}]}
seldon.default.pipeline.pipeline-mul10.outputs	ciep0bofh5ss73dpdiq0	{"outputs":[{"name":"OUTPUT", "datatype":"FP32", "shape":["4"], "contents":{"fp32Contents":[10, 20, 30, 40]}}]}

seldon experiment stop addmul10-mirror
seldon pipeline unload pipeline-add10
seldon pipeline unload pipeline-mul10
seldon model unload add10
seldon model unload mul10

Experiment version examples

This notebook will show how we can update running experiments.

Test change candidate for a model

We will use three SKlearn Iris classification models to illustrate experiment updates.

Load all models.

seldon model load -f ./models/sklearn1.yaml
seldon model load -f ./models/sklearn2.yaml
seldon model load -f ./models/sklearn3.yaml

{}
{}
{}

seldon model status iris -w ModelAvailable
seldon model status iris2 -w ModelAvailable
seldon model status iris3 -w ModelAvailable

{}
{}
{}

Let's call all three models individually first.

seldon model infer iris -i 50 \
  '{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}]}'

Success: map[:iris_1::50]

seldon model infer iris2 -i 50 \
  '{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}]}'

Success: map[:iris2_1::50]

seldon model infer iris3 -i 50 \
  '{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}]}'

Success: map[:iris3_1::50]

We will start an experiment to change the iris endpoint to split traffic with the iris2 model.

cat ./experiments/ab-default-model.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Experiment
metadata:
  name: experiment-sample
spec:
  default: iris
  candidates:
  - name: iris
    weight: 50
  - name: iris2
    weight: 50

seldon experiment start -f ./experiments/ab-default-model.yaml

{}

seldon experiment status experiment-sample -w | jq -M .

{
  "experimentName": "experiment-sample",
  "active": true,
  "candidatesReady": true,
  "mirrorReady": true,
  "statusDescription": "experiment active",
  "kubernetesMeta": {}
}

Now when we call the iris model we should see a roughly 50/50 split between the two models.

seldon model infer iris -i 100 \
  '{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}]}'

Success: map[:iris2_1::48 :iris_1::52]

Now we update the experiment to change to a split with the iris3 model.

cat ./experiments/ab-default-model2.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Experiment
metadata:
  name: experiment-sample
spec:
  default: iris
  candidates:
  - name: iris
    weight: 50
  - name: iris3
    weight: 50

seldon experiment start -f ./experiments/ab-default-model2.yaml

{}

seldon experiment status experiment-sample -w | jq -M .

{
  "experimentName": "experiment-sample",
  "active": true,
  "candidatesReady": true,
  "mirrorReady": true,
  "statusDescription": "experiment active",
  "kubernetesMeta": {}
}

Now we should see a split with the iris3 model.

seldon model infer iris -i 100 \
  '{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}]}'

Success: map[:iris3_1::42 :iris_1::58]

seldon experiment stop experiment-sample

{}

Now the experiment has been stopped we check everything as before.

seldon model infer iris -i 50 \
  '{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}]}'

Success: map[:iris_1::50]

seldon model infer iris2 -i 50 \
  '{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}]}'

Success: map[:iris2_1::50]

seldon model infer iris3 -i 50 \
  '{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}]}'

Success: map[:iris3_1::50]

seldon model unload iris
seldon model unload iris2
seldon model unload iris3

{}
{}
{}

Test change default model in an experiment

Here we test changing the model we want to split traffic on. We will use three SKlearn Iris classification models to illustrate.

seldon model load -f ./models/sklearn1.yaml
seldon model load -f ./models/sklearn2.yaml
seldon model load -f ./models/sklearn3.yaml

{}
{}
{}

seldon model status iris -w ModelAvailable
seldon model status iris2 -w ModelAvailable
seldon model status iris3 -w ModelAvailable

{}
{}
{}

Let's call all three models to verify initial conditions.

seldon model infer iris -i 50 \
  '{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}]}'

Success: map[:iris_1::50]

seldon model infer iris2 -i 50 \
  '{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}]}'

Success: map[:iris2_1::50]

seldon model infer iris3 -i 50 \
  '{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}]}'

Success: map[:iris3_1::50]

Now we start an experiment to change calls to the iris model to split with the iris2 model.

cat ./experiments/ab-default-model.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Experiment
metadata:
  name: experiment-sample
spec:
  default: iris
  candidates:
  - name: iris
    weight: 50
  - name: iris2
    weight: 50

seldon experiment start -f ./experiments/ab-default-model.yaml

{}

seldon experiment status experiment-sample -w | jq -M .

{
  "experimentName": "experiment-sample",
  "active": true,
  "candidatesReady": true,
  "mirrorReady": true,
  "statusDescription": "experiment active",
  "kubernetesMeta": {}
}

Run a set of calls and record which route the traffic took. There should be roughly a 50/50 split.

seldon model infer iris -i 100 \
  '{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}]}'

Success: map[:iris2_1::51 :iris_1::49]

Now let's change the model we want to experiment to modify to the iris3 model. Splitting between that and iris2.

cat ./experiments/ab-default-model3.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Experiment
metadata:
  name: experiment-sample
spec:
  default: iris3
  candidates:
  - name: iris3
    weight: 50
  - name: iris2
    weight: 50

seldon experiment start -f ./experiments/ab-default-model3.yaml

{}

seldon experiment status experiment-sample -w | jq -M .

{
  "experimentName": "experiment-sample",
  "active": true,
  "candidatesReady": true,
  "mirrorReady": true,
  "statusDescription": "experiment active",
  "kubernetesMeta": {}
}

Let's check the iris model is now as before but the iris3 model has traffic split.

seldon model infer iris -i 50 \
  '{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}]}'

Success: map[:iris_1::50]

seldon model infer iris3 -i 50 \
  '{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}]}'

Success: map[:iris2_1::25 :iris3_1::25]

seldon model infer iris2 -i 50 \
  '{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}]}'

Success: map[:iris2_1::50]

seldon experiment stop experiment-sample

{}

Finally, let's check now the experiment has stopped as is as at the start.

seldon model infer iris -i 50 \
  '{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}]}'

Success: map[:iris_1::50]

seldon model infer iris2 -i 50 \
  '{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}]}'

Success: map[:iris2_1::50]

seldon model infer iris3 -i 50 \
  '{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}]}'

Success: map[:iris3_1::50]

seldon model unload iris
seldon model unload iris2
seldon model unload iris3

{}
{}
{}

Inference examples

We will show:

Model inference to a Tensorflow model
- REST and gRPC using seldon CLI, curl and grpcurl
Pipeline inference
- REST and gRPC using seldon CLI, curl and grpcurl

%env INFER_ENDPOINT=0.0.0.0:9000

env: INFER_ENDPOINT=0.0.0.0:9000

Tensorflow Model

cat ./models/tfsimple1.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: tfsimple1
spec:
  storageUri: "gs://seldon-models/triton/simple"
  requirements:
  - tensorflow
  memory: 100Ki

Load the model.

seldon model load -f ./models/tfsimple1.yaml

{}

Wait for the model to be ready.

seldon model status tfsimple1 -w ModelAvailable | jq -M .

{}

seldon model infer tfsimple1 --inference-host ${INFER_ENDPOINT} \
    '{"inputs":[{"name":"INPUT0","data":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16],"datatype":"INT32","shape":[1,16]},{"name":"INPUT1","data":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16],"datatype":"INT32","shape":[1,16]}]}'

{
	"model_name": "tfsimple1_1",
	"model_version": "1",
	"outputs": [
		{
			"name": "OUTPUT0",
			"datatype": "INT32",
			"shape": [
				1,
				16
			],
			"data": [
				2,
				4,
				6,
				8,
				10,
				12,
				14,
				16,
				18,
				20,
				22,
				24,
				26,
				28,
				30,
				32
			]
		},
		{
			"name": "OUTPUT1",
			"datatype": "INT32",
			"shape": [
				1,
				16
			],
			"data": [
				0,
				0,
				0,
				0,
				0,
				0,
				0,
				0,
				0,
				0,
				0,
				0,
				0,
				0,
				0,
				0
			]
		}
	]
}

seldon model infer tfsimple1 --inference-mode grpc  --inference-host ${INFER_ENDPOINT} \
    '{"model_name":"tfsimple1","inputs":[{"name":"INPUT0","contents":{"intContents":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]},"datatype":"INT32","shape":[1,16]},{"name":"INPUT1","contents":{"intContents":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]},"datatype":"INT32","shape":[1,16]}]}'

{"modelName":"tfsimple1_1","modelVersion":"1","outputs":[{"name":"OUTPUT0","datatype":"INT32","shape":["1","16"],"contents":{"intContents":[2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32]}},{"name":"OUTPUT1","datatype":"INT32","shape":["1","16"],"contents":{"intContents":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]}}]}

curl http://${INFER_ENDPOINT}/v2/models/tfsimple1/infer -H "Content-Type: application/json" -H "seldon-model: tfsimple1" \
        -d '{"inputs":[{"name":"INPUT0","data":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16],"datatype":"INT32","shape":[1,16]},{"name":"INPUT1","data":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16],"datatype":"INT32","shape":[1,16]}]}'

{"model_name":"tfsimple1_1","model_version":"1","outputs":[{"name":"OUTPUT0","datatype":"INT32","shape":[1,16],"data":[2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32]},{"name":"OUTPUT1","datatype":"INT32","shape":[1,16],"data":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]}]}

grpcurl -d '{"model_name":"tfsimple1","inputs":[{"name":"INPUT0","contents":{"int_contents":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]},"datatype":"INT32","shape":[1,16]},{"name":"INPUT1","contents":{"int_contents":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]},"datatype":"INT32","shape":[1,16]}]}' \
    -plaintext \
    -import-path ../apis \
    -proto ../apis/mlops/v2_dataplane/v2_dataplane.proto \
    -rpc-header seldon-model:tfsimple1 \
    ${INFER_ENDPOINT} inference.GRPCInferenceService/ModelInfer

{
  "modelName": "tfsimple1_1",
  "modelVersion": "1",
  "outputs": [
    {
      "name": "OUTPUT0",
      "datatype": "INT32",
      "shape": [
        "1",
        "16"
      ]
    },
    {
      "name": "OUTPUT1",
      "datatype": "INT32",
      "shape": [
        "1",
        "16"
      ]
    }
  ],
  "rawOutputContents": [
    "AgAAAAQAAAAGAAAACAAAAAoAAAAMAAAADgAAABAAAAASAAAAFAAAABYAAAAYAAAAGgAAABwAAAAeAAAAIAAAAA==",
    "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=="
  ]
}

cat ./pipelines/tfsimple.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Pipeline
metadata:
  name: tfsimple
spec:
  steps:
    - name: tfsimple1
  output:
    steps:
    - tfsimple1

seldon pipeline load -f ./pipelines/tfsimple.yaml

{}

seldon pipeline status tfsimple -w PipelineReady

{"pipelineName":"tfsimple","versions":[{"pipeline":{"name":"tfsimple","uid":"cg5fm6c6dpcs73c4qhe0","version":1,"steps":[{"name":"tfsimple1"}],"output":{"steps":["tfsimple1.outputs"]},"kubernetesMeta":{}},"state":{"pipelineVersion":1,"status":"PipelineReady","reason":"created pipeline","lastChangeTimestamp":"2023-03-10T09:40:41.317797761Z","modelsReady":true}}]}

seldon pipeline infer tfsimple  --inference-host ${INFER_ENDPOINT} \
    '{"inputs":[{"name":"INPUT0","data":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16],"datatype":"INT32","shape":[1,16]},{"name":"INPUT1","data":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16],"datatype":"INT32","shape":[1,16]}]}'

{
	"model_name": "",
	"outputs": [
		{
			"data": [
				2,
				4,
				6,
				8,
				10,
				12,
				14,
				16,
				18,
				20,
				22,
				24,
				26,
				28,
				30,
				32
			],
			"name": "OUTPUT0",
			"shape": [
				1,
				16
			],
			"datatype": "INT32"
		},
		{
			"data": [
				0,
				0,
				0,
				0,
				0,
				0,
				0,
				0,
				0,
				0,
				0,
				0,
				0,
				0,
				0,
				0
			],
			"name": "OUTPUT1",
			"shape": [
				1,
				16
			],
			"datatype": "INT32"
		}
	]
}

seldon pipeline infer tfsimple --inference-mode grpc  --inference-host ${INFER_ENDPOINT} \
    '{"model_name":"tfsimple1","inputs":[{"name":"INPUT0","contents":{"int_contents":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]},"datatype":"INT32","shape":[1,16]},{"name":"INPUT1","contents":{"int_contents":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]},"datatype":"INT32","shape":[1,16]}]}'

{"outputs":[{"name":"OUTPUT0","datatype":"INT32","shape":["1","16"],"contents":{"intContents":[2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32]}},{"name":"OUTPUT1","datatype":"INT32","shape":["1","16"],"contents":{"intContents":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]}}]}

curl http://${INFER_ENDPOINT}/v2/models/tfsimple1/infer -H "Content-Type: application/json" -H "seldon-model: tfsimple.pipeline" \
        -d '{"inputs":[{"name":"INPUT0","data":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16],"datatype":"INT32","shape":[1,16]},{"name":"INPUT1","data":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16],"datatype":"INT32","shape":[1,16]}]}'

{"model_name":"","outputs":[{"data":[2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32],"name":"OUTPUT0","shape":[1,16],"datatype":"INT32"},{"data":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],"name":"OUTPUT1","shape":[1,16],"datatype":"INT32"}]}

grpcurl -d '{"model_name":"tfsimple1","inputs":[{"name":"INPUT0","contents":{"int_contents":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]},"datatype":"INT32","shape":[1,16]},{"name":"INPUT1","contents":{"int_contents":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]},"datatype":"INT32","shape":[1,16]}]}' \
    -plaintext \
    -import-path ../apis \
    -proto ../apis/mlops/v2_dataplane/v2_dataplane.proto \
    -rpc-header seldon-model:tfsimple.pipeline \
    ${INFER_ENDPOINT} inference.GRPCInferenceService/ModelInfer

{
  "outputs": [
    {
      "name": "OUTPUT0",
      "datatype": "INT32",
      "shape": [
        "1",
        "16"
      ]
    },
    {
      "name": "OUTPUT1",
      "datatype": "INT32",
      "shape": [
        "1",
        "16"
      ]
    }
  ],
  "rawOutputContents": [
    "AgAAAAQAAAAGAAAACAAAAAoAAAAMAAAADgAAABAAAAASAAAAFAAAABYAAAAYAAAAGgAAABwAAAAeAAAAIAAAAA==",
    "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=="
  ]
}

seldon pipeline unload tfsimple
seldon model unload tfsimple1

{}
{}

Tritonclient examples

To install tritonclient

pip install tritonclient[all]

Tritonclient Examples with Seldon Core 2

import os
os.environ["NAMESPACE"] = "seldon-mesh"

MESH_IP=!kubectl get svc seldon-mesh -n ${NAMESPACE} -o jsonpath='{.status.loadBalancer.ingress[0].ip}'
MESH_IP=MESH_IP[0]
import os
os.environ['MESH_IP'] = MESH_IP
MESH_IP

'172.19.255.1'

With MLServer

Deploy Model and Pipeline

cat models/sklearn-iris-gs.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: iris
spec:
  storageUri: "gs://seldon-models/scv2/samples/mlserver_1.3.5/iris-sklearn"
  requirements:
  - sklearn
  memory: 100Ki

cat pipelines/iris.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Pipeline
metadata:
  name: iris-pipeline
spec:
  steps:
    - name: iris
  output:
    steps:
    - iris

kubectl apply -f models/sklearn-iris-gs.yaml -n ${NAMESPACE}
kubectl apply -f pipelines/iris.yaml -n ${NAMESPACE}

model.mlops.seldon.io/iris created
pipeline.mlops.seldon.io/iris-pipeline created

kubectl wait --for condition=ready --timeout=300s model iris -n ${NAMESPACE}
kubectl wait --for condition=ready --timeout=300s pipelines iris-pipeline -n ${NAMESPACE}

model.mlops.seldon.io/iris condition met
pipeline.mlops.seldon.io/iris-pipeline condition met

HTTP Transport Protocol

import tritonclient.http as httpclient
import numpy as np

http_triton_client = httpclient.InferenceServerClient(
    url=f"{MESH_IP}:80",
    verbose=False,
)

print("model ready:", http_triton_client.is_model_ready("iris"))
print("model metadata:", http_triton_client.get_model_metadata("iris"))

model ready: True
model metadata: {'name': 'iris_1', 'versions': [], 'platform': '', 'inputs': [], 'outputs': [], 'parameters': {}}

# Against model

binary_data = False

inputs = [httpclient.InferInput("predict", (1, 4), "FP64")]
inputs[0].set_data_from_numpy(np.array([[1, 2, 3, 4]]).astype("float64"), binary_data=binary_data)

outputs = [httpclient.InferRequestedOutput("predict", binary_data=binary_data)]

result = http_triton_client.infer("iris", inputs, outputs=outputs)
result.as_numpy("predict")

array([[2]])

# Against pipeline

binary_data = False

inputs = [httpclient.InferInput("predict", (1, 4), "FP64")]
inputs[0].set_data_from_numpy(np.array([[1, 2, 3, 4]]).astype("float64"), binary_data=binary_data)

outputs = [httpclient.InferRequestedOutput("predict", binary_data=binary_data)]

result = http_triton_client.infer("iris-pipeline.pipeline", inputs, outputs=outputs)
result.as_numpy("predict")

array([[2]])

GRPC Transport Protocol

import tritonclient.grpc as grpcclient
import numpy as np

grpc_triton_client = grpcclient.InferenceServerClient(
    url=f"{MESH_IP}:80",
    verbose=False,
)

model_name = "iris"
headers = {"seldon-model": model_name}

print("model ready:", grpc_triton_client.is_model_ready(model_name, headers=headers))
print(grpc_triton_client.get_model_metadata(model_name, headers=headers))

model ready: True
name: "iris_1"

Against Model

model_name = "iris"
headers = {"seldon-model": model_name}

inputs = [
    grpcclient.InferInput("predict", (1, 4), "FP64"),
]
inputs[0].set_data_from_numpy(np.array([[1, 2, 3, 4]]).astype("float64"))

outputs = [grpcclient.InferRequestedOutput("predict")]

result = grpc_triton_client.infer(model_name, inputs, outputs=outputs, headers=headers)
result.as_numpy("predict")

array([[2]])

Against Pipeline

model_name = "iris-pipeline.pipeline"
headers = {"seldon-model": model_name}

inputs = [
    grpcclient.InferInput("predict", (1, 4), "FP64"),
]
inputs[0].set_data_from_numpy(np.array([[1, 2, 3, 4]]).astype("float64"))

outputs = [grpcclient.InferRequestedOutput("predict")]

result = grpc_triton_client.infer(model_name, inputs, outputs=outputs, headers=headers)
result.as_numpy("predict")

array([[2]])

With Tritonserver

Note: binary data support in HTTP is blocked by https://github.com/SeldonIO/seldon-core-v2/issues/475

Deploy Model and Pipeline

cat models/tfsimple1.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: tfsimple1
spec:
  storageUri: "gs://seldon-models/triton/simple"
  requirements:
  - tensorflow
  memory: 100Ki

cat pipelines/tfsimple.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Pipeline
metadata:
  name: tfsimple
spec:
  steps:
    - name: tfsimple1
  output:
    steps:
    - tfsimple1

kubectl apply -f models/tfsimple1.yaml -n ${NAMESPACE}
kubectl apply -f pipelines/tfsimple.yaml -n ${NAMESPACE}

model.mlops.seldon.io/tfsimple1 created
pipeline.mlops.seldon.io/tfsimple created

kubectl wait --for condition=ready --timeout=300s model tfsimple1 -n ${NAMESPACE}
kubectl wait --for condition=ready --timeout=300s pipelines tfsimple -n ${NAMESPACE}

model.mlops.seldon.io/tfsimple1 condition met
pipeline.mlops.seldon.io/tfsimple condition met

HTTP Transport Protocol

import tritonclient.http as httpclient
import numpy as np

http_triton_client = httpclient.InferenceServerClient(
    url=f"{MESH_IP}:80",
    verbose=False,
)

print("model ready:", http_triton_client.is_model_ready("iris"))
print("model metadata:", http_triton_client.get_model_metadata("iris"))

model ready: True
model metadata: {'name': 'iris_1', 'versions': [], 'platform': '', 'inputs': [], 'outputs': [], 'parameters': {}}

# Against model (no binary data)

binary_data = False

inputs = [
    httpclient.InferInput("INPUT0", (1, 16), "INT32"),
    httpclient.InferInput("INPUT1", (1, 16), "INT32"),
]
inputs[0].set_data_from_numpy(np.arange(1, 17).reshape(-1, 16).astype("int32"), binary_data=binary_data)
inputs[1].set_data_from_numpy(np.arange(1, 17).reshape(-1, 16).astype("int32"), binary_data=binary_data)

outputs = [httpclient.InferRequestedOutput("OUTPUT0", binary_data=binary_data)]

result = http_triton_client.infer("tfsimple1", inputs, outputs=outputs)
result.as_numpy("OUTPUT0")

array([[ 2,  4,  6,  8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32]],
      dtype=int32)

# Against model (with binary data)

binary_data = True

inputs = [
    httpclient.InferInput("INPUT0", (1, 16), "INT32"),
    httpclient.InferInput("INPUT1", (1, 16), "INT32"),
]
inputs[0].set_data_from_numpy(np.arange(1, 17).reshape(-1, 16).astype("int32"), binary_data=binary_data)
inputs[1].set_data_from_numpy(np.arange(1, 17).reshape(-1, 16).astype("int32"), binary_data=binary_data)

outputs = [httpclient.InferRequestedOutput("OUTPUT0", binary_data=binary_data)]

result = http_triton_client.infer("tfsimple1", inputs, outputs=outputs)
result.as_numpy("OUTPUT0")

array([[ 2,  4,  6,  8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32]],
      dtype=int32)

# Against Pipeline (no binary data)

binary_data = False

inputs = [
    httpclient.InferInput("INPUT0", (1, 16), "INT32"),
    httpclient.InferInput("INPUT1", (1, 16), "INT32"),
]
inputs[0].set_data_from_numpy(np.arange(1, 17).reshape(-1, 16).astype("int32"), binary_data=binary_data)
inputs[1].set_data_from_numpy(np.arange(1, 17).reshape(-1, 16).astype("int32"), binary_data=binary_data)

outputs = [httpclient.InferRequestedOutput("OUTPUT0", binary_data=binary_data)]

result = http_triton_client.infer("tfsimple.pipeline", inputs, outputs=outputs)
result.as_numpy("OUTPUT0")

array([[ 2,  4,  6,  8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32]],
      dtype=int32)

## binary data does not work with http behind pipeline

# import numpy as np

# binary_data = True

# inputs = [
#     httpclient.InferInput("INPUT0", (1, 16), "INT32"),
#     httpclient.InferInput("INPUT1", (1, 16), "INT32"),
# ]
# inputs[0].set_data_from_numpy(np.arange(1, 17).reshape(-1, 16).astype("int32"), binary_data=binary_data)
# inputs[1].set_data_from_numpy(np.arange(1, 17).reshape(-1, 16).astype("int32"), binary_data=binary_data)

# outputs = [httpclient.InferRequestedOutput("OUTPUT0", binary_data=binary_data)]

# result = http_triton_client.infer("tfsimple.pipeline", inputs, outputs=outputs)
# result.as_numpy("OUTPUT0")

GRPC Transport Protocol

import tritonclient.grpc as grpcclient
import numpy as np

grpc_triton_client = grpcclient.InferenceServerClient(
    url=f"{MESH_IP}:80",
    verbose=False,
)

model_name = "tfsimple1"
headers = {"seldon-model": model_name}

print("model ready:", grpc_triton_client.is_model_ready(model_name, headers=headers))
print(grpc_triton_client.get_model_metadata(model_name, headers=headers))

model ready: True
name: "tfsimple1_1"
versions: "1"
platform: "tensorflow_graphdef"
inputs {
  name: "INPUT0"
  datatype: "INT32"
  shape: -1
  shape: 16
}
inputs {
  name: "INPUT1"
  datatype: "INT32"
  shape: -1
  shape: 16
}
outputs {
  name: "OUTPUT0"
  datatype: "INT32"
  shape: -1
  shape: 16
}
outputs {
  name: "OUTPUT1"
  datatype: "INT32"
  shape: -1
  shape: 16
}

# Against Model

model_name = "tfsimple1"
headers = {"seldon-model": model_name}

inputs = [
    grpcclient.InferInput("INPUT0", (1, 16), "INT32"),
    grpcclient.InferInput("INPUT1", (1, 16), "INT32"),
]
inputs[0].set_data_from_numpy(np.arange(1, 17).reshape(-1, 16).astype("int32"))
inputs[1].set_data_from_numpy(np.arange(1, 17).reshape(-1, 16).astype("int32"))

outputs = [grpcclient.InferRequestedOutput("OUTPUT0")]

result = grpc_triton_client.infer(model_name, inputs, outputs=outputs, headers=headers)
result.as_numpy("OUTPUT0")

array([[ 2,  4,  6,  8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32]],
      dtype=int32)

# Against Pipeline

model_name = "tfsimple.pipeline"
headers = {"seldon-model": model_name}

inputs = [
    grpcclient.InferInput("INPUT0", (1, 16), "INT32"),
    grpcclient.InferInput("INPUT1", (1, 16), "INT32"),
]
inputs[0].set_data_from_numpy(np.arange(1, 17).reshape(-1, 16).astype("int32"))
inputs[1].set_data_from_numpy(np.arange(1, 17).reshape(-1, 16).astype("int32"))

outputs = [grpcclient.InferRequestedOutput("OUTPUT0")]

result = grpc_triton_client.infer(model_name, inputs, outputs=outputs, headers=headers)
result.as_numpy("OUTPUT0")

array([[ 2,  4,  6,  8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32]],
      dtype=int32)

Cleanup

kubectl delete -f models/sklearn-iris-gs.yaml -n ${NAMESPACE}
kubectl delete -f pipelines/iris.yaml -n ${NAMESPACE}

model.mlops.seldon.io "iris" deleted
pipeline.mlops.seldon.io "iris-pipeline" deleted

kubectl delete -f models/tfsimple1.yaml -n ${NAMESPACE}
kubectl delete -f pipelines/tfsimple.yaml -n ${NAMESPACE}

model.mlops.seldon.io "tfsimple1" deleted
pipeline.mlops.seldon.io "tfsimple" deleted

Batch Inference examples (local)

This example runs you through a series of batch inference requests made to both models and pipelines running on Seldon Core locally.

Deprecated: The MLServer CLI infer feature is experimental and will be removed in future work.

Setup

Deploy Models and Pipelines

First, let's jump in to the samples folder where we'll find some sample models and pipelines we can use:

cd samples/

Deploy the Iris Model

Let's take a look at a sample model before we deploy it:

cat models/sklearn-iris-gs.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: iris
spec:
  storageUri: "gs://seldon-models/mlserver/iris"
  requirements:
  - sklearn
  memory: 100Ki

Let's now deploy that model using the Seldon CLI:

seldon model load -f models/sklearn-iris-gs.yaml

Deploy the Iris Pipeline

cat pipelines/iris.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Pipeline
metadata:
  name: iris-pipeline
spec:
  steps:
    - name: iris
  output:
    steps:
    - iris

We see that this pipeline only has one step, which is to call the iris model we deployed earlier. We can create the pipeline by running:

seldon pipeline load -f pipelines/iris.yaml

Deploy the Tensorflow Model

cat models/tfsimple1.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: tfsimple1
spec:
  storageUri: "gs://seldon-models/triton/simple"
  requirements:
  - tensorflow
  memory: 100Ki

The tensorflow model takes two arrays as inputs and returns two arrays as outputs. The first output is the addition of the two inputs and the second output is the value of (first input - second input).

Let's deploy the model:

seldon model load -f models/tfsimple1.yaml

Deploy the Tensorflow Pipeline

Just as we did for the scikit-learn model, we'll deploy a simple pipeline for our tensorflow model:

Inspect the pipeline manifest:

cat pipelines/tfsimple.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Pipeline
metadata:
  name: tfsimple
spec:
  steps:
    - name: tfsimple1
  output:
    steps:
    - tfsimple1

and deploy it:

seldon pipeline load -f pipelines/tfsimple.yaml

Check Model and Pipeline Status

Once we've deployed a model or pipeline to Seldon Core, we can list them and check their status by running:

seldon model list

and

seldon pipeline list

Your models and pieplines should be showing a state of ModelAvailable and PipelineReady respectively.

Test Predictions

Before we run a large batch job of predictions through our models and pipelines, let's quickly check that they work with a single standalone inference request. We can do this using the seldon model infer command.

Scikit-learn Model

seldon model infer iris '{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}]}' | jq

{
  "model_name": "iris_1",
  "model_version": "1",
  "id": "a67233c2-2f8c-4fbc-a87e-4e4d3d034c9f",
  "parameters": {
    "content_type": null,
    "headers": null
  },
  "outputs": [
    {
      "name": "predict",
      "shape": [
        1
      ],
      "datatype": "INT64",
      "parameters": null,
      "data": [
        2
      ]
    }
  ]
}

The preidiction request body needs to be an Open Inference Protocol compatible payload and also match the expected inputs for the model you've deployed. In this case, the iris model expects data of shape [1, 4] and of type FP32.

You'll notice that the prediction results for this request come back on outputs[0].data.

Scikit-learn Pipeline

seldon pipeline infer iris-pipeline '{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}]}' |  jq

{
  "model_name": "",
  "outputs": [
    {
      "data": [
        2
      ],
      "name": "predict",
      "shape": [
        1
      ],
      "datatype": "INT64"
    }
  ]
}

Tensorflow Model

seldon model infer tfsimple1 '{"outputs":[{"name":"OUTPUT0"}], "inputs":[{"name":"INPUT0","data":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16],"datatype":"INT32","shape":[1,16]},{"name":"INPUT1","data":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16],"datatype":"INT32","shape":[1,16]}]}' | jq

{
  "model_name": "tfsimple1_1",
  "model_version": "1",
  "outputs": [
    {
      "name": "OUTPUT0",
      "datatype": "INT32",
      "shape": [
        1,
        16
      ],
      "data": [
        2,
        4,
        6,
        8,
        10,
        12,
        14,
        16,
        18,
        20,
        22,
        24,
        26,
        28,
        30,
        32
      ]
    }
  ]
}

You'll notice that the inputs for our tensorflow model look different from the ones we sent to the iris model. This time, we're sending two arrays of shape [1,16]. When sending an inference request, we can optionally chose which outputs we want back by including an {"outputs":...} object.

Tensorflow Pipeline

seldon pipeline infer tfsimple '"inputs":[{"name":"INPUT0","data":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16],"datatype":"INT32","shape":[1,16]},{"name":"INPUT1","data":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16],"datatype":"INT32","shape":[1,16]}]}' | jq

{
  "model_name": "",
  "outputs": [
    {
      "data": [
        2,
        4,
        6,
        8,
        10,
        12,
        14,
        16,
        18,
        20,
        22,
        24,
        26,
        28,
        30,
        32
      ],
      "name": "OUTPUT0",
      "shape": [
        1,
        16
      ],
      "datatype": "INT32"
    },
    {
      "data": [
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0
      ],
      "name": "OUTPUT1",
      "shape": [
        1,
        16
      ],
      "datatype": "INT32"
    }
  ]
}

Running the Scikit-Learn Batch Job

In the samples folder there is a batch request input file: batch-inputs/iris-input.txt. It contains 100 input payloads for our iris model. Let's take a look at the first line in that file:

cat batch-inputs/iris-input.txt | head -n 1 | jq

{
  "inputs": [
    {
      "name": "predict",
      "data": [
        0.38606369295833043,
        0.006894049558299753,
        0.6104082981607108,
        0.3958954239450676
      ],
      "datatype": "FP64",
      "shape": [
        1,
        4
      ]
    }
  ]
}

To run a batch inference job we'll use the MLServer CLI. If you don't already have it installed you can install it using:

pip install mlserver

Iris Model

The inference job can be executed by running the following command:

mlserver infer -u localhost:9000 -m iris -i batch-inputs/iris-input.txt -o /tmp/iris-output.txt --workers 5

2023-01-22 18:24:17,272 [mlserver] INFO - Using asyncio event-loop policy: uvloop
2023-01-22 18:24:17,273 [mlserver] INFO - server url: localhost:9000
2023-01-22 18:24:17,273 [mlserver] INFO - model name: iris
2023-01-22 18:24:17,273 [mlserver] INFO - request headers: {}
2023-01-22 18:24:17,273 [mlserver] INFO - input file path: batch-inputs/iris-input.txt
2023-01-22 18:24:17,273 [mlserver] INFO - output file path: /tmp/iris-output.txt
2023-01-22 18:24:17,273 [mlserver] INFO - workers: 5
2023-01-22 18:24:17,273 [mlserver] INFO - retries: 3
2023-01-22 18:24:17,273 [mlserver] INFO - batch interval: 0.0
2023-01-22 18:24:17,274 [mlserver] INFO - batch jitter: 0.0
2023-01-22 18:24:17,274 [mlserver] INFO - connection timeout: 60
2023-01-22 18:24:17,274 [mlserver] INFO - micro-batch size: 1
2023-01-22 18:24:17,420 [mlserver] INFO - Finalizer: processed instances: 100
2023-01-22 18:24:17,421 [mlserver] INFO - Total processed instances: 100
2023-01-22 18:24:17,421 [mlserver] INFO - Time taken: 0.15 seconds

The mlserver batch component will take your input file batch-inputs/iris-input.txt, distribute those payloads across 5 different workers (--workers 5), collect the responses and write them to a file /tmp/iris-output.txt. For a full set of options check out the MLServer CLI Reference.

Checking the Output

We can check the inference responses by looking at the contents of the output file:

cat /tmp/iris-output.txt | head -n 1 | jq

Iris Pipeline

We can run the same batch job for our iris pipeline and store the outputs in a different file:

mlserver infer -u localhost:9000 -m iris-pipeline.pipeline -i batch-inputs/iris-input.txt -o /tmp/iris-pipeline-output.txt --workers 5

2023-01-22 18:25:18,651 [mlserver] INFO - Using asyncio event-loop policy: uvloop
2023-01-22 18:25:18,653 [mlserver] INFO - server url: localhost:9000
2023-01-22 18:25:18,653 [mlserver] INFO - model name: iris-pipeline.pipeline
2023-01-22 18:25:18,653 [mlserver] INFO - request headers: {}
2023-01-22 18:25:18,653 [mlserver] INFO - input file path: batch-inputs/iris-input.txt
2023-01-22 18:25:18,653 [mlserver] INFO - output file path: /tmp/iris-pipeline-output.txt
2023-01-22 18:25:18,653 [mlserver] INFO - workers: 5
2023-01-22 18:25:18,653 [mlserver] INFO - retries: 3
2023-01-22 18:25:18,653 [mlserver] INFO - batch interval: 0.0
2023-01-22 18:25:18,653 [mlserver] INFO - batch jitter: 0.0
2023-01-22 18:25:18,653 [mlserver] INFO - connection timeout: 60
2023-01-22 18:25:18,653 [mlserver] INFO - micro-batch size: 1
2023-01-22 18:25:18,963 [mlserver] INFO - Finalizer: processed instances: 100
2023-01-22 18:25:18,963 [mlserver] INFO - Total processed instances: 100
2023-01-22 18:25:18,963 [mlserver] INFO - Time taken: 0.31 seconds

Checking the Output

We can check the inference responses by looking at the contents of the output file:

cat /tmp/iris-pipeline-output.txt | head -n 1 | jq

Running the Tensorflow Batch Job

The samples folder contains an example batch input for the tensorflow model, just as it did for the scikit-learn model. You can find it at batch-inputs/tfsimple-input.txt. Let's take a look at the first inference request in the file:

cat batch-inputs/tfsimple-input.txt | head -n 1 | jq

{
  "inputs": [
    {
      "name": "INPUT0",
      "data": [
        75,
        39,
        9,
        44,
        32,
        97,
        99,
        40,
        13,
        27,
        25,
        36,
        18,
        77,
        62,
        60
      ],
      "datatype": "INT32",
      "shape": [
        1,
        16
      ]
    },
    {
      "name": "INPUT1",
      "data": [
        39,
        7,
        14,
        58,
        13,
        88,
        98,
        66,
        97,
        57,
        49,
        3,
        49,
        63,
        37,
        12
      ],
      "datatype": "INT32",
      "shape": [
        1,
        16
      ]
    }
  ]
}

Tensorflow Model

As before, we can run the inference batch job using the mlserver infer command:

mlserver infer -u localhost:9000 -m tfsimple1 -i batch-inputs/tfsimple-input.txt -o /tmp/tfsimple-output.txt --workers 10

2023-01-23 14:56:10,870 [mlserver] INFO - Using asyncio event-loop policy: uvloop
2023-01-23 14:56:10,872 [mlserver] INFO - server url: localhost:9000
2023-01-23 14:56:10,872 [mlserver] INFO - model name: tfsimple1
2023-01-23 14:56:10,872 [mlserver] INFO - request headers: {}
2023-01-23 14:56:10,872 [mlserver] INFO - input file path: batch-inputs/tfsimple-input.txt
2023-01-23 14:56:10,872 [mlserver] INFO - output file path: /tmp/tfsimple-output.txt
2023-01-23 14:56:10,872 [mlserver] INFO - workers: 10
2023-01-23 14:56:10,872 [mlserver] INFO - retries: 3
2023-01-23 14:56:10,872 [mlserver] INFO - batch interval: 0.0
2023-01-23 14:56:10,872 [mlserver] INFO - batch jitter: 0.0
2023-01-23 14:56:10,872 [mlserver] INFO - connection timeout: 60
2023-01-23 14:56:10,872 [mlserver] INFO - micro-batch size: 1
2023-01-23 14:56:11,077 [mlserver] INFO - Finalizer: processed instances: 100
2023-01-23 14:56:11,077 [mlserver] INFO - Total processed instances: 100
2023-01-23 14:56:11,078 [mlserver] INFO - Time taken: 0.21 seconds

Checking the Output

We can check the inference responses by looking at the contents of the output file:

cat /tmp/tfsimple-output.txt | head -n 1 | jq

You should get the following response:

{
  "model_name": "tfsimple1_1",
  "model_version": "1",
  "id": "54e6c237-8356-4c3c-96b5-2dca4596dbe9",
  "parameters": {
    "batch_index": 0,
    "inference_id": "54e6c237-8356-4c3c-96b5-2dca4596dbe9"
  },
  "outputs": [
    {
      "name": "OUTPUT0",
      "shape": [
        1,
        16
      ],
      "datatype": "INT32",
      "parameters": {},
      "data": [
        114,
        46,
        23,
        102,
        45,
        185,
        197,
        106,
        110,
        84,
        74,
        39,
        67,
        140,
        99,
        72
      ]
    },
    {
      "name": "OUTPUT1",
      "shape": [
        1,
        16
      ],
      "datatype": "INT32",
      "parameters": {},
      "data": [
        36,
        32,
        -5,
        -14,
        19,
        9,
        1,
        -26,
        -84,
        -30,
        -24,
        33,
        -31,
        14,
        25,
        48
      ]
    }
  ]
}

Tensorflow Pipeline

mlserver infer -u localhost:9000 -m tfsimple1 -i batch-inputs/tfsimple-input.txt -o /tmp/tfsimple-pipeline-output.txt --workers 10

2023-01-23 14:56:10,870 [mlserver] INFO - Using asyncio event-loop policy: uvloop
2023-01-23 14:56:10,872 [mlserver] INFO - server url: localhost:9000
2023-01-23 14:56:10,872 [mlserver] INFO - model name: tfsimple1
2023-01-23 14:56:10,872 [mlserver] INFO - request headers: {}
2023-01-23 14:56:10,872 [mlserver] INFO - input file path: batch-inputs/tfsimple-input.txt
2023-01-23 14:56:10,872 [mlserver] INFO - output file path: /tmp/tfsimple-pipeline-output.txt
2023-01-23 14:56:10,872 [mlserver] INFO - workers: 10
2023-01-23 14:56:10,872 [mlserver] INFO - retries: 3
2023-01-23 14:56:10,872 [mlserver] INFO - batch interval: 0.0
2023-01-23 14:56:10,872 [mlserver] INFO - batch jitter: 0.0
2023-01-23 14:56:10,872 [mlserver] INFO - connection timeout: 60
2023-01-23 14:56:10,872 [mlserver] INFO - micro-batch size: 1
2023-01-23 14:56:11,077 [mlserver] INFO - Finalizer: processed instances: 100
2023-01-23 14:56:11,077 [mlserver] INFO - Total processed instances: 100
2023-01-23 14:56:11,078 [mlserver] INFO - Time taken: 0.25 seconds

Checking the Output

We can check the inference responses by looking at the contents of the output file:

cat /tmp/tfsimple-pipeline-output.txt | head -n 1 | jq

Cleaning Up

Now that we've run our batch examples, let's remove the models and pipelines we created:

seldon model unload iris

seldon model unload tfsimple1

seldon pipeline unload iris-pipeline

seldon pipeline unload tfsimple

And finally let's spin down our local instance of Seldon Core:

cd ../ && make undeploy-local

Checking Pipeline readiness

Local example settings.

%env INFER_REST_ENDPOINT=http://0.0.0.0:9000
%env INFER_GRPC_ENDPOINT=0.0.0.0:9000
%env SELDON_SCHEDULE_HOST=0.0.0.0:9004

env: INFER_REST_ENDPOINT=http://0.0.0.0:9000
env: INFER_GRPC_ENDPOINT=0.0.0.0:9000
env: SELDON_SCHEDULE_HOST=0.0.0.0:9004

Remote k8s cluster example settings - change as neeed for your needs.

#%env INFER_REST_ENDPOINT=http://172.19.255.1:80
#%env INFER_GRPC_ENDPOINT=172.19.255.1:80
#%env SELDON_SCHEDULE_HOST=172.19.255.2:9004

Model Chain - Ready Check

We will check the readiness of the Pipeline after every change to model and pipeline.

cat ./pipelines/tfsimples.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Pipeline
metadata:
  name: tfsimples
spec:
  steps:
    - name: tfsimple1
    - name: tfsimple2
      inputs:
      - tfsimple1
      tensorMap:
        tfsimple1.outputs.OUTPUT0: INPUT0
        tfsimple1.outputs.OUTPUT1: INPUT1
  output:
    steps:
    - tfsimple2

curl -Ik ${INFER_REST_ENDPOINT}/v2/pipelines/tfsimples/ready

grpcurl -d '{"name":"tfsimples"}' \
    -plaintext \
    -import-path ../apis \
    -proto ../apis/mlops/v2_dataplane/v2_dataplane.proto \
    -rpc-header seldon-model:tfsimples.pipeline \
    ${INFER_GRPC_ENDPOINT} inference.GRPCInferenceService/ModelReady

ERROR:
  Code: Unimplemented
  Message:

seldon pipeline load -f ./pipelines/tfsimples.yaml
seldon pipeline status tfsimples -w PipelineReady

{"pipelineName":"tfsimples", "versions":[{"pipeline":{"name":"tfsimples", "uid":"ciepit2i8ufs73flaitg", "version":1, "steps":[{"name":"tfsimple1"}, {"name":"tfsimple2", "inputs":["tfsimple1.outputs"], "tensorMap":{"tfsimple1.outputs.OUTPUT0":"INPUT0", "tfsimple1.outputs.OUTPUT1":"INPUT1"}}], "output":{"steps":["tfsimple2.outputs"]}, "kubernetesMeta":{}}, "state":{"pipelineVersion":1, "status":"PipelineReady", "reason":"created pipeline", "lastChangeTimestamp":"2023-06-29T14:47:16.365934922Z"}}]}

seldon pipeline status tfsimples | jq .versions[0].state.modelsReady

null

curl -Ik ${INFER_REST_ENDPOINT}/v2/pipelines/tfsimples/ready

grpcurl -d '{"name":"tfsimples"}' \
    -plaintext \
    -import-path ../apis \
    -proto ../apis/mlops/v2_dataplane/v2_dataplane.proto \
    -rpc-header seldon-model:tfsimples.pipeline \
    ${INFER_GRPC_ENDPOINT} inference.GRPCInferenceService/ModelReady

{

}

seldon model load -f ./models/tfsimple1.yaml
seldon model status tfsimple1 -w ModelAvailable

{}
{}

curl -Ik ${INFER_REST_ENDPOINT}/v2/pipelines/tfsimples/ready

grpcurl -d '{"name":"tfsimples"}' \
    -plaintext \
    -import-path ../apis \
    -proto ../apis/mlops/v2_dataplane/v2_dataplane.proto \
    -rpc-header seldon-model:tfsimples.pipeline \
    ${INFER_GRPC_ENDPOINT} inference.GRPCInferenceService/ModelReady

{

}

seldon model load -f ./models/tfsimple2.yaml
seldon model status tfsimple2 -w ModelAvailable | jq -M .

{}
{}

curl -Ik ${INFER_REST_ENDPOINT}/v2/pipelines/tfsimples/ready

grpcurl -d '{"name":"tfsimples"}' \
    -plaintext \
    -import-path ../apis \
    -proto ../apis/mlops/v2_dataplane/v2_dataplane.proto \
    -rpc-header seldon-model:tfsimples.pipeline \
    ${INFER_GRPC_ENDPOINT} inference.GRPCInferenceService/ModelReady

{
  "ready": true
}

seldon pipeline status tfsimples | jq .versions[0].state.modelsReady

true

seldon pipeline unload tfsimples

curl -Ik ${INFER_REST_ENDPOINT}/v2/pipelines/tfsimples/ready

grpcurl -d '{"name":"tfsimples"}' \
    -plaintext \
    -import-path ../apis \
    -proto ../apis/mlops/v2_dataplane/v2_dataplane.proto \
    -rpc-header seldon-model:tfsimples.pipeline \
    ${INFER_GRPC_ENDPOINT} inference.GRPCInferenceService/ModelReady

ERROR:
  Code: Unimplemented
  Message:

Models will still be ready even though Pipeline terminated

seldon pipeline status tfsimples | jq .versions[0].state.modelsReady

true

seldon pipeline load -f ./pipelines/tfsimples.yaml
seldon pipeline status tfsimples -w PipelineReady

{"pipelineName":"tfsimples", "versions":[{"pipeline":{"name":"tfsimples", "uid":"ciepj5qi8ufs73flaiu0", "version":1, "steps":[{"name":"tfsimple1"}, {"name":"tfsimple2", "inputs":["tfsimple1.outputs"], "tensorMap":{"tfsimple1.outputs.OUTPUT0":"INPUT0", "tfsimple1.outputs.OUTPUT1":"INPUT1"}}], "output":{"steps":["tfsimple2.outputs"]}, "kubernetesMeta":{}}, "state":{"pipelineVersion":1, "status":"PipelineReady", "reason":"created pipeline", "lastChangeTimestamp":"2023-06-29T14:47:51.626155116Z", "modelsReady":true}}]}

curl -Ik ${INFER_REST_ENDPOINT}/v2/pipelines/tfsimples/ready

grpcurl -d '{"name":"tfsimples"}' \
    -plaintext \
    -import-path ../apis \
    -proto ../apis/mlops/v2_dataplane/v2_dataplane.proto \
    -rpc-header seldon-model:tfsimples.pipeline \
    ${INFER_GRPC_ENDPOINT} inference.GRPCInferenceService/ModelReady

{
  "ready": true
}

seldon pipeline status tfsimples | jq .versions[0].state.modelsReady

true

seldon model unload tfsimple1
seldon model unload tfsimple2

seldon pipeline status tfsimples | jq .versions[0].state.modelsReady

null

seldon pipeline unload tfsimples

Kubernetes Resource Example

import os
os.environ["NAMESPACE"] = "seldon-mesh"

MESH_IP=!kubectl get svc seldon-mesh -n ${NAMESPACE} -o jsonpath='{.status.loadBalancer.ingress[0].ip}'
MESH_IP=MESH_IP[0]
import os
os.environ['MESH_IP'] = MESH_IP
MESH_IP

'172.19.255.1'

kubectl create -f ./pipelines/tfsimples.yaml -n ${NAMESPACE}

pipeline.mlops.seldon.io/tfsimples created

kubectl wait --for condition=ready --timeout=1s pipeline --all -n ${NAMESPACE}

error: timed out waiting for the condition on pipelines/tfsimples

kubectl get pipeline tfsimples -o jsonpath='{.status.conditions[0]}' -n ${NAMESPACE}

{"lastTransitionTime":"2022-11-14T10:25:31Z","status":"False","type":"ModelsReady"}

kubectl create -f ./models/tfsimple1.yaml -n ${NAMESPACE}
kubectl create -f ./models/tfsimple2.yaml -n ${NAMESPACE}

model.mlops.seldon.io/tfsimple1 created
model.mlops.seldon.io/tfsimple2 created

kubectl wait --for condition=ready --timeout=300s pipeline --all -n ${NAMESPACE}

pipeline.mlops.seldon.io/tfsimples condition met

kubectl get pipeline tfsimples -o jsonpath='{.status.conditions[0]}' -n ${NAMESPACE}

{"lastTransitionTime":"2022-11-14T10:25:49Z","status":"True","type":"ModelsReady"}

kubectl delete -f ./models/tfsimple1.yaml -n ${NAMESPACE}
kubectl delete -f ./models/tfsimple2.yaml -n ${NAMESPACE}
kubectl delete -f ./pipelines/tfsimples.yaml -n ${NAMESPACE}

model.mlops.seldon.io "tfsimple1" deleted
model.mlops.seldon.io "tfsimple2" deleted
pipeline.mlops.seldon.io "tfsimples" deleted

Huggingface speech to sentiment with explanations pipeline

In this example we create a Pipeline to chain two huggingface models to allow speech to sentiment functionalityand add an explainer to understand the result.

This example also illustrates how explainers can target pipelines to allow complex explanations flows.

This example requires ffmpeg package to be installed locally. run make install-requirements for the Python dependencies.

Create a method to load speech from recorder; transform into mp3 and send at base64 data. On return of the result extract and show the text and sentiment.

Load Huggingface Models

We will load two Huggingface models for speech to text and text to sentiment.

Create Explain Pipeline

To allow Alibi-Explain to more easily explain the sentiment we will need:

input and output transfrorms that take the Dict values input and output by the Huggingface sentiment model and turn them into values that Alibi-Explain can easily understand with the core values we want to explain and the outputs from the sentiment model.
A separate Pipeline to allow us to join the sentiment model with the output transform

These transform models are MLServer custom runtimes as shown below:

Speech to Sentiment Pipeline with Explanation

We can now create the final pipeline that will take speech and generate sentiment alongwith an explanation of why that sentiment was predicted.

Test

We will wait for the explanation which is run asynchronously to the functional output from the Pipeline above.

Cleanup

Production income classifier with drift, outlier and explanations

To run this notebook you need the inference data. This can be acquired in two ways:

Run make train or,
gsutil cp -R gs://seldon-models/scv2/examples/income/infer-data .

Pipeline with model, drift detector and outlier detector

Show predictions from reference set. Should not be drift or outliers.

Show predictions from drift data. Should be drift and probably not outliers.

Show predictions from outlier data. Should be outliers and probably not drift.

Explanations

Cleanup

Conditional pipeline with pandas query model

The model is defined as an MLServer custom runtime and allows the user to pass in a custom pandas query as a parameter defined at model creation to be used to filter the data passed to the model.