1 of 8

Resources

For Kubernetes usage we provide a set of custom resources for interacting with Seldon.

SeldonRuntime - for installing Seldon in a particular namespace.
Servers - for deploying sets of replicas of core inference servers (MLServer or Triton).
Models - for deploying single machine learning models, custom transformation logic, drift detectors, outliers detectors and explainers.
Experiments - for testing new versions of models.
Pipelines - for connecting together flows of data between models.

Advanced Customization

SeldonConfig and ServerConfig define the core installation configuration and machine learning inference server configuration for Seldon. Normally, you would not need to customize these but this may be required for your particular custom installation within your organisation.

ServerConfigs - for defining new types of inference server that can be reference by a Server resource.
SeldonConfig - for defining how seldon is installed

Model

A Model is the core atomic building block. It specifies a machine learning artifact that will be loaded onto one of the running Servers. A model could be a standard machine learning inference component such as

a Tensorflow model, PyTorch model or SKLearn model.
an inference transformation component such as a SKLearn pipeline or a piece of custom python logic. a monitoring component such as an outlier detector or drift detector.
An alibi-explain model explainer

An example is shown below for a SKLearn model for iris classification:

:language: yaml

Its Kubernetes spec has two core requirements

A storageUri specifying the location of the artifact. This can be any rclone URI specification.
A requirements list which provides tags that need to be matched by the Server that can run this artifact type. By default when you install Seldon we provide a set of Servers that cover a range of artifact types.

Experiment

An Experiment defines a traffic split between Models or Pipelines. This allows new versions of models and pipelines to be tested.

An experiment spec has three sections:

candidates (required) : a set of candidate models to split traffic.
default (optional) : an existing candidate who endpoint should be modified to split traffic as defined by the candidates.
- Each candidate has a traffic weight. The percentage of traffic will be this weight divided by the sum of traffic weights.
mirror (optional) : a single model to mirror traffic to the candidates. Responses from this model will not be returned to the caller.

An example experiment with a defaultModel is shown below:

# samples/experiments/ab-default-model.yaml
apiVersion: mlops.seldon.io/v1alpha1
kind: Experiment
metadata:
  name: experiment-sample
spec:
  default: iris
  candidates:
  - name: iris
    weight: 50
  - name: iris2
    weight: 50

This defines a split of 50% traffic between two models iris and iris2. In this case we want to expose this traffic split on the existing endpoint created for the iris model. This allows us to test new versions of models (in this case iris2) on an existing endpoint (in this case iris). The default key defines the model whose endpoint we want to change. The experiment will become active when both underplying models are in Ready status.

An experiment over two separate models which exposes a new API endpoint is shown below:

# samples/experiments/ab.yaml
apiVersion: mlops.seldon.io/v1alpha1
kind: Experiment
metadata:
  name: experiment-iris
spec:
  candidates:
  - name: iris
    weight: 50
  - name: iris2
    weight: 50

To call the endpoint add the header seldon-model: <experiment-name>.experiment in this case: seldon-model: experiment-iris.experiment. For example with curl:

curl http://${MESH_IP}/v2/models/experiment-iris/infer \
   -H "Content-Type: application/json" \
   -H "seldon-model: experiment-iris.experiment" \
   -d '{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}]}'

For examples see the local experiments notebook.

Pipeline Experiments

Running an experiment between some pipelines is very similar. The difference is resourceType: pipeline needs to be defined and in this case the candidates or mirrors will refer to pipelines. An example is shown below:

# samples/experiments/addmul10.yaml

For an example see the local experiments notebook.

Mirror Experiments

A mirror can be added easily for model or pipeline experiments. An example model mirror experiment is shown below:

# samples/experiments/sklearn-mirror.yaml
apiVersion: mlops.seldon.io/v1alpha1
kind: Experiment
metadata:
  name: sklearn-mirror
spec:
  default: iris
  candidates:
  - name: iris
    weight: 100
  mirror:
    name: iris2
    percent: 100

For an example see the local experiments notebook.

An example pipeline mirror experiment is shown below:

# samples/experiments/addmul10-mirror.yaml
apiVersion: mlops.seldon.io/v1alpha1
kind: Experiment
metadata:
  name: addmul10-mirror
spec:
  default: pipeline-add10
  resourceType: pipeline
  candidates:
  - name: pipeline-add10
    weight: 100
  mirror:
    name: pipeline-mul10
    percent: 100

For an example see the local experiments notebook.

Sticky Sessions

To allow cohorts to get consistent views in an experiment each inference request passes back a response header x-seldon-route which can be passed in future requests to an experiment to bypass the random traffic splits and get a prediction from the sequence of models and pipelines used in the initial request.

Note: you must pass the normal seldon-model header along with the x-seldon-route header.

This is illustrated in the local experiments notebook.

Caveats: the models used will be the same but not necessarily the same replica instances. This means at present this will not work for stateful models that need to go to the same model replica instance.

Service Meshes

As an alternative you can choose to run experiments at the service mesh level if you use one of the popular service meshes that allow header based routing in traffic splits. For further discussion see here.

Pipeline

Pipelines allow one to connect flows of inference data transformed by Model components. A directed acyclic graph (DAG) of steps can be defined to join Models together. Each Model will need to be capable of receiving a V2 inference request and respond with a V2 inference response. An example Pipeline is shown below:

:language: yaml

The steps list shows three models: tfsimple1, tfsimple2 and tfsimple3. These three models each take two tensors called INPUT0 and INPUT1 of integers. The models produce two outputs OUTPUT0 (the sum of the inputs) and OUTPUT1 (subtraction of the second input from the first).

tfsimple1 and tfsimple2 take as inputs the input to the Pipeline: the default assumption when no explicit inputs are defined. tfsimple3 takes one V2 tensor input from each of the outputs of tfsimple1 and tfsimple2. As the outputs of tfsimple1 and tfsimple2 have tensors named OUTPUT0 and OUTPUT1 their names need to be changed to respect the expected input tensors and this is done with a tensorMap component providing this tensor renaming. This is only required if your models can not be directly chained together.

The output of the Pipeline is the output from the tfsimple3 model.

Detailed Specification

The full GoLang specification for a Pipeline is shown below:

:language: golang
:start-after: // PipelineSpec
:end-before: // PipelineStatus

Server

The default installation will provide two initial servers: one MLServer and one Triton. You only need to define additional servers for advanced use cases.

A Server defines an inference server onto which models will be placed for inference. By default on installation two server StatefulSets will be deployed one MlServer and one Triton. An example Server definition is shown below:

The main requirement is a reference to a ServerConfig resource in this case mlserver.

Detailed Specs

Custom Servers

One can easily utilize a custom image with the existing ServerConfigs. For example, the following defines an MLServer server with a custom image:

This server can then be targeted by a particular model by specifying this server name when creating the model, for example:

Server with PVC

One can also create a Server definition to add a persistent volume to your server. This can be used to allow models to be loaded directly from the persistent volume.

The server can be targeted by a model whose artifact is on the persistent volume as shown below.

Server Config

This section is for advanced usage where you want to define new types of inference servers.

Server configurations define how to create an inference server. By default one is provided for Seldon MLServer and one for NVIDIA Triton Inference Server. Both these servers support the V2 inference protocol which is a requirement for all inference servers. They define how the Kubernetes ReplicaSet is defined which includes the Seldon Agent reverse proxy as well as an Rclone server for downloading artifacts for the server. The Kustomize ServerConfig for MlServer is shown below:

# operator/config/serverconfigs/mlserver.yaml

Server Runtime

The SeldonRuntime resource is used to create an instance of Seldon installed in a particular namespace.

For the definition of SeldonConfiguration above see the .

The specification above contains overrides for the chosen SeldonConfig. To override the PodSpec for a given component, the overrides field needs to specify the component name and the PodSpec needs to specify the container name, along with fields to override.

For instance, the following overrides the resource limits for cpu and memory in the hodometer component in the seldon-mesh namespace, while using values specified in the seldonConfig elsewhere (e.g. default).

As a minimal use you should just define the SeldonConfig to use as a base for this install, for example to install in the seldon-mesh namespace with the SeldonConfig named default:

The helm chart seldon-core-v2-runtime allows easy creation of this resource and associated default Servers for an installation of Seldon in a particular namespace.

SeldonConfig Update Propagation

Seldon Config

This section is for advanced usage where you want to define how seldon is installed in each namespace.

The SeldonConfig resource defines the core installation components installed by Seldon. If you wish to install Seldon, you can use the resource which allows easy overriding of some parts defined in this specification. In general, we advise core DevOps to use the default SeldonConfig or customize it for their usage. Individual installation of Seldon can then use the SeldonRuntime with a few overrides for special customisation needed in that namespace.

The specification contains core PodSpecs for each core component and a section for general configuration including the ConfigMaps that are created for the Agent (rclone defaults), Kafka and Tracing (open telemetry).

Some of these values can be overridden on a per namespace basis via the SeldonRuntime resource. Labels and annotations can also be set at the component level - these will be merged with the labels and annotations from the SeldonConfig resource in which they are defined and added to the component's corresponding Deployment, or StatefulSet.

The default configuration is shown below.

Experiment

An Experiment defines a traffic split between Models or Pipelines. This allows new versions of models and pipelines to be tested.

An experiment spec has three sections:

candidates (required) : a set of candidate models to split traffic.
default (optional) : an existing candidate who endpoint should be modified to split traffic as defined by the candidates.
- Each candidate has a traffic weight. The percentage of traffic will be this weight divided by the sum of traffic weights.
mirror (optional) : a single model to mirror traffic to the candidates. Responses from this model will not be returned to the caller.

An example experiment with a defaultModel is shown below:

# samples/experiments/ab-default-model.yaml
apiVersion: mlops.seldon.io/v1alpha1
kind: Experiment
metadata:
  name: experiment-sample
spec:
  default: iris
  candidates:
  - name: iris
    weight: 50
  - name: iris2
    weight: 50

An experiment over two separate models which exposes a new API endpoint is shown below:

# samples/experiments/ab.yaml
apiVersion: mlops.seldon.io/v1alpha1
kind: Experiment
metadata:
  name: experiment-iris
spec:
  candidates:
  - name: iris
    weight: 50
  - name: iris2
    weight: 50

To call the endpoint add the header seldon-model: <experiment-name>.experiment in this case: seldon-model: experiment-iris.experiment. For example with curl:

curl http://${MESH_IP}/v2/models/experiment-iris/infer \
   -H "Content-Type: application/json" \
   -H "seldon-model: experiment-iris.experiment" \
   -d '{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}]}'

For examples see the local experiments notebook.

Pipeline Experiments

# samples/experiments/addmul10.yaml

For an example see the local experiments notebook.

Mirror Experiments

A mirror can be added easily for model or pipeline experiments. An example model mirror experiment is shown below:

# samples/experiments/sklearn-mirror.yaml
apiVersion: mlops.seldon.io/v1alpha1
kind: Experiment
metadata:
  name: sklearn-mirror
spec:
  default: iris
  candidates:
  - name: iris
    weight: 100
  mirror:
    name: iris2
    percent: 100

For an example see the local experiments notebook.

An example pipeline mirror experiment is shown below:

# samples/experiments/addmul10-mirror.yaml
apiVersion: mlops.seldon.io/v1alpha1
kind: Experiment
metadata:
  name: addmul10-mirror
spec:
  default: pipeline-add10
  resourceType: pipeline
  candidates:
  - name: pipeline-add10
    weight: 100
  mirror:
    name: pipeline-mul10
    percent: 100

For an example see the local experiments notebook.

Sticky Sessions

Note: you must pass the normal seldon-model header along with the x-seldon-route header.

This is illustrated in the local experiments notebook.