Quickstart
In this notebook, we will demonstrate how to deploy a production-ready AI application with Seldon Core 2. This application will have two components - an sklearn model and a preprocessor written in Python - leveraging Core 2 Pipelines to connect the two. Once deployed, users will have an endpoint available to call the deployed application. The inference logic can be visualized with the following diagram:
To do this we will:
Set up a Server resource to deploy our models
Deploy an sklearn Model
Deploy a multi-step Pipeline, including a preprocessing step that will be run before calling our model.
Call our inference endpoint, and observe data within our pipeline
Step 1: Deploy a Custom Server
As part of the Core 2 installation, you will have install MLServer and Triton Servers:
!kubectl get servers -n seldon-meshNAME READY REPLICAS LOADED MODELS AGE
mlserver True 1 0 156d
mlserver-custom True 1 0 38d
triton True 1 0 156dThe server resource outlines attributes (dependency requirements, underlying infrastrucuture) for the runtimes that the models you deploy will run on. By default, MLServer supports the following frameworks out of the box: alibi-detect, alibi-explain, huggingface, lightgbm, mlflow, python, sklearn, spark-mlib, xgboost
In this example, we will create a new custom MLServer that we will tag with income-classifier-deps under capabilities (see docs here in order to define which Models will be matched to this Server. In this example, we will deploy both our model (sklearn) and our preproccesor (python) on the same Server. This is done using the manifest below:
Step 2: Deploy Models
Now we will deploy a model - in this case, we are deploying a categorical model that has been trained to take 12 features as input, and output [0] or [1], representing a [Yes] or [No] prediction of whether or not an adult with certain values for the 12 features is making more than $50K/yr. This model was trained using the Census Income (or "Adult") Dataset. Extraction was done by Barry Becker from the 1994 Census database. See here for more details.
The model artefact is currently stored in Seldon's a Google bucket - the contents of the relevant folder are below. Alongside our model artefact, we have a model-settings.json file to help locate and load the model. For more information on the Inference artefacts we support and how to configure them, see here.
In our Model manifest below, we point to the location of the model artefact using the storageUri field. You will also notice that we have defined income-classifier-deps under requirements. This will match the Model to the Server we deployed above, as Models will only be deployed onto Servers that have capabilities that match the appropriate requirements defined in the Model manifest.
In order to deploy the model, we will apply the manifest to our cluster:
We now have a deployed model, with an associated endpoint.
Make Requests
The endpoint that has been exposed by the above deployment will use an IP from our service mesh that we can obtain as follows:
Requests are made using the Open Inference Protocol. More details on this specification can be found in our docs, or in the API documentation generated by our protocol buffers in the case of gRPC usage here. This protocol is also supported by shared by Triton Inference Server for serving Deep Learning models.
We are now ready to send a request!
We can see above that the model returned a 'data': [0] in the output. This is the prediction of the model, indicating that an individual with the attributes provided is most likely making more than $50K/yr.
Step 3: Create and Deploy a 2-step Pipeline
Often we'd like to deploy AI applications that are more complex than just an individual model. For example, around our model we could consider deploying pre or post-processing steps, custom logic, other ML models, or drift and outlier detectors.
Deploy a Preprocessing step written in Python
In this example, we will create a preprocessing step that extracts numerical values from a text file for the model to use as input. This will be implemented with custom logic using Python, and deployed as custom model with MLServer:
Before deploying the preprocessing step with Core 2, we will test it locally:
Now that we've tested the python script locally, we will deploy the preprocessing step as a Model. This will allow us to connect it to our sklearn model using a Seldon Pipeline. To do so, we store in our cloud storage an inference artefact (in this case, our Python script) alongside a model-settings.json file, similar to the model deployed above.
As with the ML model deployed above, we have defined income-classifier-deps under requirements. This means that both the preprocesser and the model will be deployed using the same Server, enabling consolidation in terms of the resources and overheads used (for more about Multi-Model Serving, see here).
We've now deployed the prepocessing step! Let's test it out by calling the endpoint for it:
Create and Deploy a Pipeline connecting our deployed Models
Now that we have our preprocessor and model deployed, we will chain them together with a pipeline.
The yaml defines two steps in a pipeline (the preprocessor and model), mapping the outputs of the preprocessor model (OUTPUT0) to the input of the income classification model (INPUT0). Seldon Core will leverage Kafka to communicate between models, meaning that all data is streamed and observable in real time.
Congratulations! You have now deployed a Seldon Pipeline that exposes an endpoint for you ML application 🥳. For more tutorials on how to use Core 2 for various use-cases and requirements, see here.
Clean Up
Last updated
Was this helpful?

