MLServer is used as the core Python inference server in Seldon Core. Therefore, it should be straightforward to deploy your models either by using one of the built-in pre-packaged servers or by pointing to a custom image of MLServer.
This section assumes a basic knowledge of Seldon Core and Kubernetes, as well as access to a working Kubernetes cluster with Seldon Core installed. To learn more about Seldon Core or how to install it, please visit the Seldon Core documentation.
Out of the box, Seldon Core comes a few MLServer runtimes pre-configured to run straight away. This allows you to deploy a MLServer instance by just pointing to where your model artifact is and specifying what ML framework was used to train it.
To let Seldon Core know what framework was used to train your model, you can use the implementation
field of your SeldonDeployment
manifest. For example, to deploy a Scikit-Learn artifact stored remotely in GCS, one could do:
As you can see highlighted above, all that we need to specify is that:
Our inference deployment should use the V2 inference protocol, which is done by setting the protocol
field to kfserving
.
Our model artifact is a serialised Scikit-Learn model, therefore it should be served using the MLServer SKLearn runtime, which is done by setting the implementation
field to SKLEARN_SERVER
.
Note that, while the protocol
should always be set to kfserving
(i.e. so that models are served using the V2 inference protocol), the value of the implementation
field will be dependant on your ML framework. The valid values of the implementation
field are pre-determined by Seldon Core. However, it should also be possible to configure and add new ones (e.g. to support a custom MLServer runtime).
Once you have your SeldonDeployment
manifest ready, then the next step is to apply it to your cluster. There are multiple ways to do this, but the simplest is probably to just apply it directly through kubectl
, by running:
To consult the supported values of the implementation
field where MLServer is used, you can check the support table below.
As mentioned above, pre-packaged servers come built-in into Seldon Core. Therefore, only a pre-determined subset of them will be supported for a given release of Seldon Core.
The table below shows a list of the currently supported values of the implementation
field. Each row will also show what ML framework they correspond to and also what MLServer runtime will be enabled internally on your model deployment when used.
Scikit-Learn
SKLEARN_SERVER
XGBoost
XGBOOST_SERVER
MLflow
MLFLOW_SERVER
Note that, on top of the ones shown above (backed by MLServer), Seldon Core also provides a wider set of pre-packaged servers. To check the full list, please visit the Seldon Core documentation.
There could be cases where the pre-packaged MLServer runtimes supported out-of-the-box in Seldon Core may not be enough for our use case. The framework provided by MLServer makes it easy to write custom runtimes, which can then get packaged up as images. These images then become self-contained model servers with your custom runtime. Therefore Seldon Core makes it as easy to deploy them into your serving infrastructure.
The componentSpecs
field of the SeldonDeployment
manifest will allow us to let Seldon Core know what image should be used to serve a custom model. For example, if we assume that our custom image has been tagged as my-custom-server:0.1.0
, we could write our SeldonDeployment
manifest as follows:
As we can see highlighted on the snippet above, all that's needed to deploy a custom MLServer image is:
Letting Seldon Core know that the model deployment will be served through the V2 inference protocol) by setting the protocol
field to v2
.
Pointing our model container to use our custom MLServer image, by specifying it on the image
field of the componentSpecs
section of the manifest.
Once you have your SeldonDeployment
manifest ready, then the next step is to apply it to your cluster. There are multiple ways to do this, but the simplest is probably to just apply it directly through kubectl
, by running: