MLServer is used as the core Python inference server in Seldon Core. Therefore, it should be straightforward to deploy your models either by using one of the built-in pre-packaged servers or by pointing to a custom image of MLServer.
This section assumes a basic knowledge of Seldon Core and Kubernetes, as well as access to a working Kubernetes cluster with Seldon Core installed. To learn more about Seldon Core or how to install it, please visit the Seldon Core documentation.
Out of the box, Seldon Core comes a few MLServer runtimes pre-configured to run straight away. This allows you to deploy a MLServer instance by just pointing to where your model artifact is and specifying what ML framework was used to train it.
To let Seldon Core know what framework was used to train your model, you can use the implementation
field of your SeldonDeployment
manifest. For example, to deploy a Scikit-Learn artifact stored remotely in GCS, one could do:
As you can see highlighted above, all that we need to specify is that:
Our inference deployment should use the V2 inference protocol, which is done by setting the protocol
field to kfserving
.
Our model artifact is a serialised Scikit-Learn model, therefore it should be served using the MLServer SKLearn runtime, which is done by setting the implementation
field to SKLEARN_SERVER
.
Note that, while the protocol
should always be set to kfserving
(i.e. so that models are served using the V2 inference protocol), the value of the implementation
field will be dependant on your ML framework. The valid values of the implementation
field are pre-determined by Seldon Core. However, it should also be possible to configure and add new ones (e.g. to support a custom MLServer runtime).
Once you have your SeldonDeployment
manifest ready, then the next step is to apply it to your cluster. There are multiple ways to do this, but the simplest is probably to just apply it directly through kubectl
, by running:
To consult the supported values of the implementation
field where MLServer is used, you can check the support table below.
As mentioned above, pre-packaged servers come built-in into Seldon Core. Therefore, only a pre-determined subset of them will be supported for a given release of Seldon Core.
The table below shows a list of the currently supported values of the implementation
field. Each row will also show what ML framework they correspond to and also what MLServer runtime will be enabled internally on your model deployment when used.
Note that, on top of the ones shown above (backed by MLServer), Seldon Core also provides a wider set of pre-packaged servers. To check the full list, please visit the Seldon Core documentation.
There could be cases where the pre-packaged MLServer runtimes supported out-of-the-box in Seldon Core may not be enough for our use case. The framework provided by MLServer makes it easy to write custom runtimes, which can then get packaged up as images. These images then become self-contained model servers with your custom runtime. Therefore Seldon Core makes it as easy to deploy them into your serving infrastructure.
The componentSpecs
field of the SeldonDeployment
manifest will allow us to let Seldon Core know what image should be used to serve a custom model. For example, if we assume that our custom image has been tagged as my-custom-server:0.1.0
, we could write our SeldonDeployment
manifest as follows:
As we can see highlighted on the snippet above, all that's needed to deploy a custom MLServer image is:
Letting Seldon Core know that the model deployment will be served through the V2 inference protocol) by setting the protocol
field to v2
.
Pointing our model container to use our custom MLServer image, by specifying it on the image
field of the componentSpecs
section of the manifest.
Once you have your SeldonDeployment
manifest ready, then the next step is to apply it to your cluster. There are multiple ways to do this, but the simplest is probably to just apply it directly through kubectl
, by running:
Framework | MLServer Runtime | Seldon Core Pre-packaged Server | Documentation |
---|---|---|---|
Scikit-Learn
SKLEARN_SERVER
XGBoost
XGBOOST_SERVER
MLflow
MLFLOW_SERVER
MLServer is used as the core Python inference server in KServe (formerly known as KFServing). This allows for a straightforward avenue to deploy your models into a scalable serving infrastructure backed by Kubernetes.
This section assumes a basic knowledge of KServe and Kubernetes, as well as access to a working Kubernetes cluster with KServe installed. To learn more about KServe or how to install it, please visit the KServe documentation.
KServe provides built-in serving runtimes to deploy models trained in common ML frameworks. These allow you to deploy your models into a robust infrastructure by just pointing to where the model artifacts are stored remotely.
Some of these runtimes leverage MLServer as the core inference server. Therefore, it should be straightforward to move from your local testing to your serving infrastructure.
To use any of the built-in serving runtimes offered by KServe, it should be enough to select the relevant one your InferenceService
manifest.
For example, to serve a Scikit-Learn model, you could use a manifest like the one below:
As you can see highlighted above, the InferenceService
manifest will only need to specify the following points:
The model artifact is a Scikit-Learn model. Therefore, we will use the sklearn
serving runtime to deploy it.
The model will be served using the V2 inference protocol, which can be enabled by setting the protocolVersion
field to v2
.
Once you have your InferenceService
manifest ready, then the next step is to apply it to your cluster. There are multiple ways to do this, but the simplest is probably to just apply it directly through kubectl
, by running:
As mentioned above, KServe offers support for built-in serving runtimes, some of which leverage MLServer as the inference server. Below you can find a table listing these runtimes, and the MLServer inference runtime that they correspond to.
Note that, on top of the ones shown above (backed by MLServer), KServe also provides a wider set of serving runtimes. To see the full list, please visit the KServe documentation.
Sometimes, the serving runtimes built into KServe may not be enough for our use case. The framework provided by MLServer makes it easy to write custom runtimes, which can then get packaged up as images. These images then become self-contained model servers with your custom runtime. Therefore, it's easy to deploy them into your serving infrastructure leveraging KServe support for custom runtimes.
The InferenceService
manifest gives you full control over the containers used to deploy your machine learning model. This can be leveraged to point your deployment to the custom MLServer image containing your custom logic. For example, if we assume that our custom image has been tagged as my-custom-server:0.1.0
, we could write an InferenceService
manifest like the one below:
As we can see highlighted above, the main points that we'll need to take into account are:
Pointing to our custom MLServer image
in the custom container section of our InferenceService
.
Explicitly choosing the V2 inference protocol to serve our model.
Let KServe know what port will be exposed by our custom container to send inference requests.
Once you have your InferenceService
manifest ready, then the next step is to apply it to your cluster. There are multiple ways to do this, but the simplest is probably to just apply it directly through kubectl
, by running:
MLServer is currently used as the core Python inference server in some of most popular Kubernetes-native serving frameworks, including Seldon Core and KServe (formerly known as KFServing). This allows MLServer users to leverage the usability and maturity of these frameworks to take their model deployments to the next level of their MLOps journey, ensuring that they are served in a robust and scalable infrastructure.
In general, it should be possible to deploy models using MLServer into any serving engine compatible with the V2 protocol. Alternatively, it's also possible to manage MLServer deployments manually as regular processes (i.e. in a non-Kubernetes-native way). However, this may be more involved and highly dependant on the deployment infrastructure.
Framework | MLServer Runtime | KServe Serving Runtime | Documentation |
---|---|---|---|
Scikit-Learn
sklearn
XGBoost
xgboost