We use a simple sklearn iris classification model
cat ./models/sklearn-iris-gs.yaml
apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
name: iris
spec:
storageUri: "gs://seldon-models/scv2/samples/mlserver_1.3.5/iris-sklearn"
requirements:
- sklearn
memory: 100Ki
Load the model
seldon model load -f ./models/sklearn-iris-gs.yaml
{}
Wait for the model to be ready
seldon model status iris -w ModelAvailable | jq -M .
{}
Do a REST inference call
seldon model infer iris \
'{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}]}'
{
"model_name": "iris_1",
"model_version": "1",
"id": "983bd95f-4b4d-4ff1-95b2-df9d6d089164",
"parameters": {},
"outputs": [
{
"name": "predict",
"shape": [
1,
1
],
"datatype": "INT64",
"parameters": {
"content_type": "np"
},
"data": [
2
]
}
]
}
Do a gRPC inference call
seldon model infer iris --inference-mode grpc \
'{"model_name":"iris","inputs":[{"name":"input","contents":{"fp32_contents":[1,2,3,4]},"datatype":"FP32","shape":[1,4]}]}' | jq -M .
{
"modelName": "iris_1",
"modelVersion": "1",
"outputs": [
{
"name": "predict",
"datatype": "INT64",
"shape": [
"1",
"1"
],
"parameters": {
"content_type": {
"stringParam": "np"
}
},
"contents": {
"int64Contents": [
"2"
]
}
}
]
}
Unload the model
seldon model unload iris
We run a simple tensorflow model. Note the requirements section specifying tensorflow
.
cat ./models/tfsimple1.yaml
apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
name: tfsimple1
spec:
storageUri: "gs://seldon-models/triton/simple"
requirements:
- tensorflow
memory: 100Ki
Load the model.
seldon model load -f ./models/tfsimple1.yaml
{}
Wait for the model to be ready.
seldon model status tfsimple1 -w ModelAvailable | jq -M .
{}
Get model metadata
seldon model metadata tfsimple1
{
"name": "tfsimple1_1",
"versions": [
"1"
],
"platform": "tensorflow_graphdef",
"inputs": [
{
"name": "INPUT0",
"datatype": "INT32",
"shape": [
-1,
16
]
},
{
"name": "INPUT1",
"datatype": "INT32",
"shape": [
-1,
16
]
}
],
"outputs": [
{
"name": "OUTPUT0",
"datatype": "INT32",
"shape": [
-1,
16
]
},
{
"name": "OUTPUT1",
"datatype": "INT32",
"shape": [
-1,
16
]
}
]
}
Do a REST inference call.
seldon model infer tfsimple1 \
'{"inputs":[{"name":"INPUT0","data":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16],"datatype":"INT32","shape":[1,16]},{"name":"INPUT1","data":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16],"datatype":"INT32","shape":[1,16]}]}' | jq -M .
{
"model_name": "tfsimple1_1",
"model_version": "1",
"outputs": [
{
"name": "OUTPUT0",
"datatype": "INT32",
"shape": [
1,
16
],
"data": [
2,
4,
6,
8,
10,
12,
14,
16,
18,
20,
22,
24,
26,
28,
30,
32
]
},
{
"name": "OUTPUT1",
"datatype": "INT32",
"shape": [
1,
16
],
"data": [
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0
]
}
]
}
Do a gRPC inference call
seldon model infer tfsimple1 --inference-mode grpc \
'{"model_name":"tfsimple1","inputs":[{"name":"INPUT0","contents":{"int_contents":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]},"datatype":"INT32","shape":[1,16]},{"name":"INPUT1","contents":{"int_contents":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]},"datatype":"INT32","shape":[1,16]}]}' | jq -M .
{
"modelName": "tfsimple1_1",
"modelVersion": "1",
"outputs": [
{
"name": "OUTPUT0",
"datatype": "INT32",
"shape": [
"1",
"16"
],
"contents": {
"intContents": [
2,
4,
6,
8,
10,
12,
14,
16,
18,
20,
22,
24,
26,
28,
30,
32
]
}
},
{
"name": "OUTPUT1",
"datatype": "INT32",
"shape": [
"1",
"16"
],
"contents": {
"intContents": [
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0
]
}
}
]
}
Unload the model
seldon model unload tfsimple1
We will use two SKlearn Iris classification models to illustrate an experiment.
cat ./models/sklearn1.yaml
apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
name: iris
spec:
storageUri: "gs://seldon-models/mlserver/iris"
requirements:
- sklearn
cat ./models/sklearn2.yaml
apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
name: iris2
spec:
storageUri: "gs://seldon-models/mlserver/iris"
requirements:
- sklearn
Load both models.
seldon model load -f ./models/sklearn1.yaml
seldon model load -f ./models/sklearn2.yaml
{}
{}
Wait for both models to be ready.
seldon model status iris | jq -M .
seldon model status iris2 | jq -M .
{
"modelName": "iris",
"versions": [
{
"version": 1,
"serverName": "mlserver",
"kubernetesMeta": {},
"modelReplicaState": {
"0": {
"state": "Available",
"lastChangeTimestamp": "2023-06-29T14:01:41.362720538Z"
}
},
"state": {
"state": "ModelAvailable",
"availableReplicas": 1,
"lastChangeTimestamp": "2023-06-29T14:01:41.362720538Z"
},
"modelDefn": {
"meta": {
"name": "iris",
"kubernetesMeta": {}
},
"modelSpec": {
"uri": "gs://seldon-models/mlserver/iris",
"requirements": [
"sklearn"
]
},
"deploymentSpec": {
"replicas": 1
}
}
}
]
}
{
"modelName": "iris2",
"versions": [
{
"version": 1,
"serverName": "mlserver",
"kubernetesMeta": {},
"modelReplicaState": {
"0": {
"state": "Available",
"lastChangeTimestamp": "2023-06-29T14:01:41.362845079Z"
}
},
"state": {
"state": "ModelAvailable",
"availableReplicas": 1,
"lastChangeTimestamp": "2023-06-29T14:01:41.362845079Z"
},
"modelDefn": {
"meta": {
"name": "iris2",
"kubernetesMeta": {}
},
"modelSpec": {
"uri": "gs://seldon-models/mlserver/iris",
"requirements": [
"sklearn"
]
},
"deploymentSpec": {
"replicas": 1
}
}
}
]
}
Create an experiment that modifies the iris model to add a second model splitting traffic 50/50 between the two.
cat ./experiments/ab-default-model.yaml
apiVersion: mlops.seldon.io/v1alpha1
kind: Experiment
metadata:
name: experiment-sample
spec:
default: iris
candidates:
- name: iris
weight: 50
- name: iris2
weight: 50
Start the experiment.
seldon experiment start -f ./experiments/ab-default-model.yaml
Wait for the experiment to be ready.
seldon experiment status experiment-sample -w | jq -M .
{
"experimentName": "experiment-sample",
"active": true,
"candidatesReady": true,
"mirrorReady": true,
"statusDescription": "experiment active",
"kubernetesMeta": {}
}
Run a set of calls and record which route the traffic took. There should be roughly a 50/50 split.
seldon model infer iris -i 100 \
'{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}]}'
Success: map[:iris2_1::57 :iris_1::43]
Run one more request
seldon model infer iris \
'{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}]}'
{
"model_name": "iris_1",
"model_version": "1",
"id": "fa425bdf-737c-41fe-894d-58868f70fe5d",
"parameters": {},
"outputs": [
{
"name": "predict",
"shape": [
1,
1
],
"datatype": "INT64",
"parameters": {
"content_type": "np"
},
"data": [
2
]
}
]
}
Use sticky session key passed by last infer request to ensure same route is taken each time. We will test REST and gRPC.
seldon model infer iris -s -i 50 \
'{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}]}'
Success: map[:iris_1::50]
seldon model infer iris --inference-mode grpc -s -i 50\
'{"model_name":"iris","inputs":[{"name":"input","contents":{"fp32_contents":[1,2,3,4]},"datatype":"FP32","shape":[1,4]}]}'
Success: map[:iris_1::50]
Stop the experiment
seldon experiment stop experiment-sample
Show the requests all go to original model now.
seldon model infer iris -i 100 \
'{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}]}'
Success: map[:iris_1::100]
Unload both models.
seldon model unload iris
seldon model unload iris2