Quickstart
In this notebook, we will demonstrate how to deploy a production-ready AI application with Seldon Core 2. This application will have two components - an sklearn model and a preprocessor written in Python - leveraging Core 2 Pipelines to connect the two. Once deployed, users will have an endpoint available to call the deployed application. The inference logic can be visualized with the following diagram:
To do this we will:
Set up a Server resource to deploy our models
Deploy an sklearn Model
Deploy a multi-step Pipeline, including a preprocessing step that will be run before calling our model.
Call our inference endpoint, and observe data within our pipeline
Step 1: Deploy a Custom Server
As part of the Core 2 installation, you will have install MLServer and Triton Servers:
!kubectl get servers -n seldon-mesh
NAME READY REPLICAS LOADED MODELS AGE
mlserver True 1 0 156d
mlserver-custom True 1 0 38d
triton True 1 0 156d
The server resource outlines attributes (dependency requirements, underlying infrastrucuture) for the runtimes that the models you deploy will run on. By default, MLServer supports the following frameworks out of the box: alibi-detect
, alibi-explain
, huggingface
, lightgbm
, mlflow
, python
, sklearn
, spark-mlib
, xgboost
In this example, we will create a new custom MLServer that we will tag with income-classifier-deps
under capabilities (see docs here in order to define which Models will be matched to this Server. In this example, we will deploy both our model (sklearn
) and our preproccesor (python
) on the same Server. This is done using the manifest below:
!cat ../../../samples/quickstart/servers/mlserver-custom.yaml
apiVersion: mlops.seldon.io/v1alpha1
kind: Server
metadata:
name: mlserver-custom
spec:
serverConfig: mlserver
capabilities:
- income-classifier-deps
podSpec:
containers:
- image: seldonio/mlserver:1.6.0
name: mlserver
!kubectl apply -f ../../../samples/quickstart/servers/mlserver-custom.yaml -n seldon-mesh
server.mlops.seldon.io/mlserver-custom unchanged
Step 2: Deploy Models
Now we will deploy a model - in this case, we are deploying a categorical model that has been trained to take 12 features as input, and output [0] or [1], representing a [Yes] or [No] prediction of whether or not an adult with certain values for the 12 features is making more than $50K/yr. This model was trained using the Census Income (or "Adult") Dataset. Extraction was done by Barry Becker from the 1994 Census database. See here for more details.
The model artefact is currently stored in Seldon's a Google bucket - the contents of the relevant folder are below. Alongside our model artefact, we have a model-settings.json
file to help locate and load the model. For more information on the Inference artefacts we support and how to configure them, see here.
!gcloud storage ls --recursive gs://seldon-models/scv2/samples/mlserver_1.4.0/income-sklearn/classifier
gs://seldon-models/scv2/samples/mlserver_1.4.0/income-sklearn/classifier/:
gs://seldon-models/scv2/samples/mlserver_1.4.0/income-sklearn/classifier/model-settings.json
gs://seldon-models/scv2/samples/mlserver_1.4.0/income-sklearn/classifier/model.joblib
Updates are available for some Google Cloud CLI components. To install them,
please run:
$ gcloud components update
To take a quick anonymous survey, run:
$ gcloud survey
!gsutil cat gs://seldon-models/scv2/samples/mlserver_1.4.0/income-sklearn/classifier/model-settings.json
{
"name": "income",
"implementation": "mlserver_sklearn.SKLearnModel",
"parameters": {
"uri": "./model.joblib",
"version": "v0.1.0"
}
}
In our Model manifest below, we point to the location of the model artefact using the storageUri
field. You will also notice that we have defined income-classifier-deps
under requirements. This will match the Model to the Server we deployed above, as Models will only be deployed onto Servers that have capabilities that match the appropriate requirements defined in the Model manifest.
!cat ../../../samples/quickstart/models/sklearn-income-classifier.yaml
apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
name: income-classifier
spec:
storageUri: "gs://seldon-models/scv2/samples/mlserver_1.4.0/income-sklearn/classifier"
requirements:
- income-classifier-deps
memory: 100Ki
In order to deploy the model, we will apply the manifest to our cluster:
!kubectl apply -f ../../../samples/quickstart/models/sklearn-income-classifier.yaml -n seldon-mesh
model.mlops.seldon.io/income-classifier created
We now have a deployed model, with an associated endpoint.
Make Requests
The endpoint that has been exposed by the above deployment will use an IP from our service mesh that we can obtain as follows:
MESH_IP = !kubectl get svc seldon-mesh -n seldon-mesh -o jsonpath='{.status.loadBalancer.ingress[0].ip}'
MESH_IP = MESH_IP[0]
MESH_IP
'34.32.149.48'
endpoint = f"http://{MESH_IP}/v2/models/income-classifier/infer"
headers = {
"Seldon-Model": "income-classifier",
}
Requests are made using the Open Inference Protocol. More details on this specification can be found in our docs, or in the API documentation generated by our protocol buffers in the case of gRPC usage here. This protocol is also supported by shared by Triton Inference Server for serving Deep Learning models.
inference_request = {
"inputs": [
{
"name": "income",
"datatype": "INT64",
"shape": [1, 12],
"data": [53, 4, 0, 2, 8, 4, 2, 0, 0, 0, 60, 9]
}
]
}
We are now ready to send a request!
import requests
response = requests.post(endpoint, headers=headers, json=inference_request)
response.json()
{'model_name': 'income-classifier_1',
'model_version': '1',
'id': '626ebe8e-bc95-433f-8f5f-ef296625622a',
'parameters': {},
'outputs': [{'name': 'predict',
'shape': [1, 1],
'datatype': 'INT64',
'parameters': {'content_type': 'np'},
'data': [0]}]}
We can see above that the model returned a 'data': [0]
in the output. This is the prediction of the model, indicating that an individual with the attributes provided is most likely making more than $50K/yr.
Step 3: Create and Deploy a 2-step Pipeline
Often we'd like to deploy AI applications that are more complex than just an individual model. For example, around our model we could consider deploying pre or post-processing steps, custom logic, other ML models, or drift and outlier detectors.
Deploy a Preprocessing step written in Python
In this example, we will create a preprocessing step that extracts numerical values from a text file for the model to use as input. This will be implemented with custom logic using Python, and deployed as custom model with MLServer:
import re
import numpy as np
# Extracts numerical values from a formatted text and outputs a vector of numerical values.
def extract_numerical_values(input_text):
# Find key-value pairs in text
pattern = r'"[^"]+":\s*"([^"]+)"'
matches = re.findall(pattern, input_text)
# Extract numerical values
numerical_values = []
for value in matches:
cleaned_value = value.replace(",", "")
if cleaned_value.isdigit(): # Integer
numerical_values.append(int(cleaned_value))
else:
try:
numerical_values.append(float(cleaned_value))
except ValueError:
pass
# Return array of numerical values
return np.array(numerical_values)
Before deploying the preprocessing step with Core 2, we will test it locally:
input_text = '''
"Age": "47",
"Workclass": "4",
"Education": "1",
"Marital Status": "1",
"Occupation": "1",
"Relationship": "0",
"Race": "4",
"Sex": "1",
"Capital Gain": "0",
"Capital Loss": "0",
"Hours per week": "68",
"Country": "9",
"Name": "John Doe"
'''
extract_numerical_values(input_text)
array([47, 4, 1, 1, 1, 0, 4, 1, 0, 0, 68, 9])
Now that we've tested the python script locally, we will deploy the preprocessing step as a Model. This will allow us to connect it to our sklearn model using a Seldon Pipeline. To do so, we store in our cloud storage an inference artefact (in this case, our Python script) alongside a model-settings.json
file, similar to the model deployed above.
!gcloud storage ls --recursive gs://seldon-models/scv2/samples/preprocessor
gs://seldon-models/scv2/samples/preprocessor/:
gs://seldon-models/scv2/samples/preprocessor/model-settings.json
gs://seldon-models/scv2/samples/preprocessor/model.py
gs://seldon-models/scv2/samples/preprocessor/preprocessor.yaml
!cat ../../../samples/quickstart/models/preprocessor/preprocessor.yaml
apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
name: preprocessor
spec:
storageUri: "gs://seldon-models/scv2/samples/preprocessor"
requirements:
- income-classifier-deps
As with the ML model deployed above, we have defined income-classifier-deps
under requirements. This means that both the preprocesser and the model will be deployed using the same Server, enabling consolidation in terms of the resources and overheads used (for more about Multi-Model Serving, see here).
!kubectl apply -f ../../../samples/quickstart/models/preprocessor/preprocessor.yaml -n seldon-mesh
model.mlops.seldon.io/preprocessor created
We've now deployed the prepocessing step! Let's test it out by calling the endpoint for it:
endpoint_pp = f"http://{MESH_IP}/v2/models/preprocessor/infer"
headers_pp = {
"Seldon-Model": "preprocessor",
}
text_inference_request = {
"inputs": [
{
"name": "text_input",
"shape": [1],
"datatype": "BYTES",
"data": [input_text]
}
]
}
import requests
response = requests.post(endpoint_pp, headers=headers_pp, json=text_inference_request)
response.json()
{'model_name': 'preprocessor_1',
'model_version': '1',
'id': 'b26e49d5-2a4c-488b-8dff-0df850fbed3d',
'parameters': {},
'outputs': [{'name': 'output',
'shape': [1, 12],
'datatype': 'INT64',
'parameters': {'content_type': 'np'},
'data': [47, 4, 1, 1, 1, 0, 4, 1, 0, 0, 68, 9]}]}
Create and Deploy a Pipeline connecting our deployed Models
Now that we have our preprocessor and model deployed, we will chain them together with a pipeline.
!cat ../../../samples/quickstart/pipelines/income-classifier-app.yaml
apiVersion: mlops.seldon.io/v1alpha1
kind: Pipeline
metadata:
name: income-classifier-app
spec:
steps:
- name: preprocessor
- name: income-classifier
inputs:
- preprocessor
output:
steps:
- income-classifier
The yaml defines two steps in a pipeline (the preprocessor and model), mapping the outputs of the preprocessor model (OUTPUT0
) to the input of the income classification model (INPUT0
). Seldon Core will leverage Kafka to communicate between models, meaning that all data is streamed and observable in real time.
!kubectl apply -f ../../../samples/quickstart/pipelines/income-classifier-app.yaml -n seldon-mesh
pipeline.mlops.seldon.io/income-classifier-app created
pipeline_endpoint = f"http://{MESH_IP}/v2/models/income-classifier-app/infer"
pipeline_headers = {
"Seldon-Model": "income-classifier-app.pipeline"
}
pipeline_response = requests.post(
pipeline_endpoint, json=text_inference_request, headers=pipeline_headers
)
pipeline_response.json()
{'model_name': '',
'outputs': [{'data': [0],
'name': 'predict',
'shape': [1, 1],
'datatype': 'INT64',
'parameters': {'content_type': 'np'}}]}
Congratulations! You have now deployed a Seldon Pipeline that exposes an endpoint for you ML application 🥳. For more tutorials on how to use Core 2 for various use-cases and requirements, see here.
Clean Up
!kubectl delete -f ../../../samples/quickstart/pipelines/income-classifier-app.yaml -n seldon-mesh
!kubectl delete -f ../../../samples/quickstart/models/preprocessor/preprocessor.yaml -n seldon-mesh
!kubectl delete -f ../../../samples/quickstart/models/sklearn-income-classifier.yaml -n seldon-mesh
!kubectl delete -f ../../../samples/quickstart/servers/mlserver-custom.yaml -n seldon-mesh
pipeline.mlops.seldon.io "income-classifier-app" deleted
model.mlops.seldon.io "preprocessor" deleted
model.mlops.seldon.io "income-classifier" deleted
server.mlops.seldon.io "mlserver-custom" deleted
Last updated
Was this helpful?