The tensorflow model takes two arrays as inputs and returns two arrays as outputs. The first output is the addition of the two inputs and the second output is the value of (first input - second input).
Let's deploy the model:
seldonmodelload-fmodels/tfsimple1.yaml
Deploy the Tensorflow Pipeline
Just as we did for the scikit-learn model, we'll deploy a simple pipeline for our tensorflow model:
Once we've deployed a model or pipeline to Seldon Core, we can list them and check their status by running:
seldonmodellist
and
seldonpipelinelist
Your models and pieplines should be showing a state of ModelAvailable and PipelineReady respectively.
Test Predictions
Before we run a large batch job of predictions through our models and pipelines, let's quickly check that they work with a single standalone inference request. We can do this using the seldon model infer command.
The preidiction request body needs to be an Open Inference Protocol compatible payload and also match the expected inputs for the model you've deployed. In this case, the iris model expects data of shape [1, 4] and of type FP32.
You'll notice that the prediction results for this request come back on outputs[0].data.
You'll notice that the inputs for our tensorflow model look different from the ones we sent to the iris model. This time, we're sending two arrays of shape [1,16]. When sending an inference request, we can optionally chose which outputs we want back by including an {"outputs":...} object.
In the samples folder there is a batch request input file: batch-inputs/iris-input.txt. It contains 100 input payloads for our iris model. Let's take a look at the first line in that file:
2023-01-22 18:24:17,272 [mlserver] INFO - Using asyncio event-loop policy: uvloop
2023-01-22 18:24:17,273 [mlserver] INFO - server url: localhost:9000
2023-01-22 18:24:17,273 [mlserver] INFO - model name: iris
2023-01-22 18:24:17,273 [mlserver] INFO - request headers: {}
2023-01-22 18:24:17,273 [mlserver] INFO - input file path: batch-inputs/iris-input.txt
2023-01-22 18:24:17,273 [mlserver] INFO - output file path: /tmp/iris-output.txt
2023-01-22 18:24:17,273 [mlserver] INFO - workers: 5
2023-01-22 18:24:17,273 [mlserver] INFO - retries: 3
2023-01-22 18:24:17,273 [mlserver] INFO - batch interval: 0.0
2023-01-22 18:24:17,274 [mlserver] INFO - batch jitter: 0.0
2023-01-22 18:24:17,274 [mlserver] INFO - connection timeout: 60
2023-01-22 18:24:17,274 [mlserver] INFO - micro-batch size: 1
2023-01-22 18:24:17,420 [mlserver] INFO - Finalizer: processed instances: 100
2023-01-22 18:24:17,421 [mlserver] INFO - Total processed instances: 100
2023-01-22 18:24:17,421 [mlserver] INFO - Time taken: 0.15 seconds
The mlserver batch component will take your input file batch-inputs/iris-input.txt, distribute those payloads across 5 different workers (--workers 5), collect the responses and write them to a file /tmp/iris-output.txt. For a full set of options check out the MLServer CLI Reference.
Checking the Output
We can check the inference responses by looking at the contents of the output file:
cat/tmp/iris-output.txt|head-n1|jq
Iris Pipeline
We can run the same batch job for our iris pipeline and store the outputs in a different file:
2023-01-22 18:25:18,651 [mlserver] INFO - Using asyncio event-loop policy: uvloop
2023-01-22 18:25:18,653 [mlserver] INFO - server url: localhost:9000
2023-01-22 18:25:18,653 [mlserver] INFO - model name: iris-pipeline.pipeline
2023-01-22 18:25:18,653 [mlserver] INFO - request headers: {}
2023-01-22 18:25:18,653 [mlserver] INFO - input file path: batch-inputs/iris-input.txt
2023-01-22 18:25:18,653 [mlserver] INFO - output file path: /tmp/iris-pipeline-output.txt
2023-01-22 18:25:18,653 [mlserver] INFO - workers: 5
2023-01-22 18:25:18,653 [mlserver] INFO - retries: 3
2023-01-22 18:25:18,653 [mlserver] INFO - batch interval: 0.0
2023-01-22 18:25:18,653 [mlserver] INFO - batch jitter: 0.0
2023-01-22 18:25:18,653 [mlserver] INFO - connection timeout: 60
2023-01-22 18:25:18,653 [mlserver] INFO - micro-batch size: 1
2023-01-22 18:25:18,963 [mlserver] INFO - Finalizer: processed instances: 100
2023-01-22 18:25:18,963 [mlserver] INFO - Total processed instances: 100
2023-01-22 18:25:18,963 [mlserver] INFO - Time taken: 0.31 seconds
Checking the Output
We can check the inference responses by looking at the contents of the output file:
cat/tmp/iris-pipeline-output.txt|head-n1|jq
Running the Tensorflow Batch Job
The samples folder contains an example batch input for the tensorflow model, just as it did for the scikit-learn model. You can find it at batch-inputs/tfsimple-input.txt. Let's take a look at the first inference request in the file: