Batch Inference examples (local)

This example runs you through a series of batch inference requests made to both models and pipelines running on Seldon Core locally.

Setup

If you haven't already, you'll need to clone the Seldon Core repository and run it locally before you run through this example.

Note: By default, the CLI will expect your inference endpoint to be at 0.0.0.0:9000. If you have customized this, you'll need to redirect the CLI.

Deploy Models and Pipelines

First, let's jump in to the samples folder where we'll find some sample models and pipelines we can use:

cd samples/

Deploy the Iris Model

Let's take a look at a sample model before we deploy it:

cat models/sklearn-iris-gs.yaml
apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: iris
spec:
  storageUri: "gs://seldon-models/mlserver/iris"
  requirements:
  - sklearn
  memory: 100Ki

The above manifest will deploy a simple sci-kit learn model based on the iris dataset.

Let's now deploy that model using the Seldon CLI:

Deploy the Iris Pipeline

Now that we've deployed our iris model, let's create a pipeline around the model.

We see that this pipeline only has one step, which is to call the iris model we deployed earlier. We can create the pipeline by running:

Deploy the Tensorflow Model

To demonstrate batch inference requests to different types of models, we'll also deploy a simple tensorflow model:

The tensorflow model takes two arrays as inputs and returns two arrays as outputs. The first output is the addition of the two inputs and the second output is the value of (first input - second input).

Let's deploy the model:

Deploy the Tensorflow Pipeline

Just as we did for the scikit-learn model, we'll deploy a simple pipeline for our tensorflow model:

Inspect the pipeline manifest:

and deploy it:

Check Model and Pipeline Status

Once we've deployed a model or pipeline to Seldon Core, we can list them and check their status by running:

and

Your models and pieplines should be showing a state of ModelAvailable and PipelineReady respectively.

Test Predictions

Before we run a large batch job of predictions through our models and pipelines, let's quickly check that they work with a single standalone inference request. We can do this using theseldon model infer command.

Scikit-learn Model

The preidiction request body needs to be an Open Inference Protocol compatible payload and also match the expected inputs for the model you've deployed. In this case, the iris model expects data of shape [1, 4] and of type FP32.

You'll notice that the prediction results for this request come back on outputs[0].data.

Scikit-learn Pipeline

Tensorflow Model

You'll notice that the inputs for our tensorflow model look different from the ones we sent to the iris model. This time, we're sending two arrays of shape [1,16]. When sending an inference request, we can optionally chose which outputs we want back by including an {"outputs":...} object.

Tensorflow Pipeline

Running the Scikit-Learn Batch Job

In the samples folder there is a batch request input file: batch-inputs/iris-input.txt. It contains 100 input payloads for our iris model. Let's take a look at the first line in that file:

To run a batch inference job we'll use the MLServer CLI. If you don't already have it installed you can install it using:

Iris Model

The inference job can be executed by running the following command:

The mlserver batch component will take your input file batch-inputs/iris-input.txt, distribute those payloads across 5 different workers (--workers 5), collect the responses and write them to a file /tmp/iris-output.txt. For a full set of options check out theMLServer CLI Reference.

Checking the Output

We can check the inference responses by looking at the contents of the output file:

Iris Pipeline

We can run the same batch job for our iris pipeline and store the outputs in a different file:

Checking the Output

We can check the inference responses by looking at the contents of the output file:

Running the Tensorflow Batch Job

The samples folder contains an example batch input for the tensorflow model, just as it did for the scikit-learn model. You can find it at batch-inputs/tfsimple-input.txt. Let's take a look at the first inference request in the file:

Tensorflow Model

As before, we can run the inference batch job using the mlserver infer command:

Checking the Output

We can check the inference responses by looking at the contents of the output file:

You should get the following response:

Tensorflow Pipeline

Checking the Output

We can check the inference responses by looking at the contents of the output file:

Cleaning Up

Now that we've run our batch examples, let's remove the models and pipelines we created:

And finally let's spin down our local instance of Seldon Core:

Last updated

Was this helpful?