Python Serialization Cost Benchmark

Prequisites

An authenticated K8S cluster with istio and Seldon Core installed
- You can use the ansible seldon-core playbook at https://github.com/SeldonIO/ansible-k8s-collection
vegeta and ghz benchmarking tools

Port forward to istio

kubectl port-forward $(kubectl get pods -l istio=ingressgateway -n istio-system -o jsonpath='{.items[0].metadata.name}') -n istio-system 8003:8080

Tests

Large Batch Size
- predict method with:
  - REST
    ndarray
    tensor
    tftensor
  - gRPC
    ndarray
    tensor
    tftensor
- predict_raw method with:
  - REST
    ndarray
    tensor
    tftensor
  - gRPC
    ndarray
    tensor
    tftensor
Small Batch Size
- predict method with:
  - REST
    ndarray
    tensor
    tftensor
  - gRPC
    ndarray
    tensor
    tftensor

TLDR

gRPC is faster than REST
tftensor is best for large batch size
ndarray with gRPC is bad for large batch size
simpler tensor/ndarray is better for small batch size

from IPython.core.magic import register_line_cell_magic


@register_line_cell_magic
def writetemplate(line, cell):
    with open(line, "w") as f:
        f.write(cell.format(**globals()))

VERSION = !cat ../../../version.txt
VERSION = VERSION[0]
VERSION

'1.10.0-dev'

!kubectl create namespace seldon

Error from server (AlreadyExists): namespaces "seldon" already exists

!helm upgrade --install seldon-core seldon-core-operator --repo https://storage.googleapis.com/seldon-charts --version 1.9.0 --namespace seldon-system --set istio.enabled="true" --set istio.gateway="seldon-gateway.istio-system.svc.cluster.local"

Release "seldon-core" has been upgraded. Happy Helming!
NAME: seldon-core
LAST DEPLOYED: Thu Jul  1 14:03:55 2021
NAMESPACE: seldon-system
STATUS: deployed
REVISION: 2
TEST SUITE: None

Test with Predict method on Large Batch Size

The seldontest_predict has simply a predict method that does a loop with a configurable number of iterations (default 1) to simulate work. The iterations can be set as a Seldon parameter but in this case we are looking to benchmark the serialization/deserialization cost so want a minimal amount of work.

%%writetemplate model.yaml
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
  name: seldon-model
  namespace: seldon
spec:
  predictors:
  - annotations:
      seldon.io/no-engine: "true"
    componentSpecs:
    - spec:
        containers:
        - image: seldonio/seldontest_predict:{VERSION}
          imagePullPolicy: IfNotPresent
          name: classifier
          resources:
            requests:
              cpu: 1
            limits:
              cpu: 1
          env:
          - name: GUNICORN_WORKERS
            value: "1"
          - name: GUNICORN_THREADS
            value: "1"
        tolerations:
        - key: model
          operator: Exists
          effect: NoSchedule
    graph:
      children: []
      name: classifier
      type: MODEL
    name: default
    replicas: 1

!kubectl apply -f model.yaml

seldondeployment.machinelearning.seldon.io/seldon-model created

!kubectl wait --for condition=ready --timeout=600s pods --all -n seldon

pod/seldon-model-default-0-classifier-5445bd4ccf-c2vdr condition met

Create payloads and associated vegeta configurations for

ndarray
tensor
tftensor

We will create an array of 100,000 consecutive integers.

import json

sz = 100000
vals = list(range(sz))
valStr = f"{vals}"
payload = '{"data": {"ndarray": [' + valStr + "]}}"
with open("data_ndarray.json", "w") as f:
    f.write(payload)
payload_tensor = (
    '{"data":{"tensor":{"shape":[1,' + str(sz) + '],"values":' + valStr + "}}}"
)
with open("data_tensor.json", "w") as f:
    f.write(payload_tensor)

import numpy as np
import tensorflow as tf
from google.protobuf import json_format

array = np.array(vals)
tftensor = tf.make_tensor_proto(array)
jStrTensor = json_format.MessageToJson(tftensor)
jTensor = json.loads(jStrTensor)
payload_tftensor = (
    '{"data":{"tftensor":' + json.dumps(jTensor, separators=(",", ":")) + "}}"
)
with open("data_tftensor.json", "w") as f:
    f.write(payload_tftensor)

import base64
import json

sample_string_bytes = payload_tensor.encode("ascii")
base64_bytes = base64.b64encode(sample_string_bytes)
base64_string = base64_bytes.decode("ascii")
jqPayload = {
    "method": "POST",
    "url": "http://localhost:8003/seldon/seldon/seldon-model/api/v1.0/predictions",
    "body": base64_string,
    "header": {"Content-Type": ["application/json"]},
}
with open("vegeta_tensor.json", "w") as f:
    f.write(json.dumps(jqPayload, separators=(",", ":")))
    f.write("\n")

sample_string_bytes = payload.encode("ascii")
base64_bytes = base64.b64encode(sample_string_bytes)
base64_string = base64_bytes.decode("ascii")
jqPayload = {
    "method": "POST",
    "url": "http://localhost:8003/seldon/seldon/seldon-model/api/v1.0/predictions",
    "body": base64_string,
    "header": {"Content-Type": ["application/json"]},
}
with open("vegeta_ndarray.json", "w") as f:
    f.write(json.dumps(jqPayload, separators=(",", ":")))
    f.write("\n")


sample_string_bytes = payload_tftensor.encode("ascii")
base64_bytes = base64.b64encode(sample_string_bytes)
base64_string = base64_bytes.decode("ascii")
jqPayload = {
    "method": "POST",
    "url": "http://localhost:8003/seldon/seldon/seldon-model/api/v1.0/predictions",
    "body": base64_string,
    "header": {"Content-Type": ["application/json"]},
}
with open("vegeta_tftensor.json", "w") as f:
    f.write(json.dumps(jqPayload, separators=(",", ":")))
    f.write("\n")

Smoke test port-forward to check everything is working

!curl -X POST -H 'Content-Type: application/json' \
   -d '@./data_ndarray.json' \
    http://localhost:8003/seldon/seldon/seldon-model/api/v1.0/predictions

{"data":{"names":[],"ndarray":[1]},"meta":{"requestPath":{"classifier":"seldonio/seldontest_predict:1.10.0-dev"}}}

!curl -X POST -H 'Content-Type: application/json' \
   -d '@./data_tensor.json' \
    http://localhost:8003/seldon/seldon/seldon-model/api/v1.0/predictions

{"data":{"names":[],"tensor":{"shape":[1],"values":[1]}},"meta":{"requestPath":{"classifier":"seldonio/seldontest_predict:1.10.0-dev"}}}

!curl -X POST -H 'Content-Type: application/json' \
   -d '@./data_tftensor.json' \
    http://localhost:8003/seldon/seldon/seldon-model/api/v1.0/predictions

{"data":{"names":[],"tftensor":{"dtype":"DT_INT64","int64Val":["1"],"tensorShape":{"dim":[{"size":"1"}]}}},"meta":{"requestPath":{"classifier":"seldonio/seldontest_predict:1.10.0-dev"}}}

Test REST

ndarray
tensor
tftensor

This can be done locally as the results should be indicative of the relative differences rather than very accurate timings.

%%bash
vegeta attack -format=json -duration=10s -rate=0 -max-workers=1 -targets=vegeta_ndarray.json | 
  vegeta report -type=text

Requests      [total, rate, throughput]         518, 51.76, 51.66
Duration      [total, attack, wait]             10.027s, 10.008s, 19.333ms
Latencies     [min, mean, 50, 90, 95, 99, max]  17.337ms, 19.355ms, 19.136ms, 20.336ms, 21.214ms, 24.886ms, 27.831ms
Bytes In      [total, mean]                     59570, 115.00
Bytes Out     [total, mean]                     356857970, 688915.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:518  
Error Set:

%%bash
vegeta attack -format=json -duration=10s -rate=0 -max-workers=1 -targets=vegeta_tensor.json | 
  vegeta report -type=text

Requests      [total, rate, throughput]         504, 50.35, 50.25
Duration      [total, attack, wait]             10.03s, 10.01s, 19.353ms
Latencies     [min, mean, 50, 90, 95, 99, max]  17.885ms, 19.897ms, 19.616ms, 21.1ms, 22.205ms, 25.498ms, 34.99ms
Bytes In      [total, mean]                     69048, 137.00
Bytes Out     [total, mean]                     347225760, 688940.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:504  
Error Set:

%%bash
vegeta attack -format=json -duration=10s -rate=0 -max-workers=1 -targets=vegeta_tftensor.json | 
  vegeta report -type=text

Requests      [total, rate, throughput]         636, 63.55, 63.45
Duration      [total, attack, wait]             10.023s, 10.008s, 14.782ms
Latencies     [min, mean, 50, 90, 95, 99, max]  13.646ms, 15.756ms, 15.461ms, 17.41ms, 18.729ms, 20.628ms, 23.465ms
Bytes In      [total, mean]                     118932, 187.00
Bytes Out     [total, mean]                     678466356, 1066771.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:636  
Error Set:

Example results

ndarray

tensor

tftensor

19.8ms

19.7ms

16.2ms

Test gRPC

ndarray
tensor
tftensor

%%bash
ghz \
    --insecure \
    --proto ../../../proto/prediction.proto \
    --call seldon.protos.Seldon/Predict \
    --data-file=./data_ndarray.json \
    --qps=0 \
    --cpus=1 \
    --concurrency=1 \
    --duration="10s" \
    --format summary \
    --metadata='{"seldon": "seldon-model", "namespace": "seldon"}' \
    localhost:8003

Summary:
  Count:	24
  Total:	10.13 s
  Slowest:	278.81 ms
  Fastest:	242.25 ms
  Average:	244.06 ms
  Requests/sec:	2.37

Response time histogram:
  242.253 [1]	|∎∎∎∎∎∎∎
  245.909 [2]	|∎∎∎∎∎∎∎∎∎∎∎∎∎
  249.564 [4]	|∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  253.219 [6]	|∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  256.874 [4]	|∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  260.530 [1]	|∎∎∎∎∎∎∎
  264.185 [1]	|∎∎∎∎∎∎∎
  267.840 [2]	|∎∎∎∎∎∎∎∎∎∎∎∎∎
  271.496 [0]	|
  275.151 [1]	|∎∎∎∎∎∎∎
  278.806 [1]	|∎∎∎∎∎∎∎

Latency distribution:
  10 % in 247.44 ms 
  25 % in 249.47 ms 
  50 % in 252.85 ms 
  75 % in 260.70 ms 
  90 % in 272.55 ms 
  95 % in 278.81 ms 
  0 % in 0 ns 

Status code distribution:
  [OK]         23 responses   
  [Canceled]   1 responses    

Error distribution:
  [1]   rpc error: code = Canceled desc = grpc: the client connection is closing

%%bash
ghz \
    --insecure \
    --proto ../../../proto/prediction.proto \
    --call seldon.protos.Seldon/Predict \
    --data-file=./data_tensor.json \
    --qps=0 \
    --cpus=1 \
    --concurrency=1 \
    --duration="10s" \
    --format summary \
    --metadata='{"seldon": "seldon-model", "namespace": "seldon"}' \
    localhost:8003

Summary:
  Count:	92
  Total:	10.10 s
  Slowest:	21.23 ms
  Fastest:	4.91 ms
  Average:	7.58 ms
  Requests/sec:	9.11

Response time histogram:
  4.906 [1]	|∎
  6.539 [55]	|∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  8.171 [17]	|∎∎∎∎∎∎∎∎∎∎∎∎
  9.804 [4]	|∎∎∎
  11.436 [0]	|
  13.069 [4]	|∎∎∎
  14.701 [3]	|∎∎
  16.334 [3]	|∎∎
  17.966 [0]	|
  19.599 [2]	|∎
  21.232 [2]	|∎

Latency distribution:
  10 % in 5.51 ms 
  25 % in 5.70 ms 
  50 % in 6.14 ms 
  75 % in 7.09 ms 
  90 % in 14.14 ms 
  95 % in 18.77 ms 
  0 % in 0 ns 

Status code distribution:
  [OK]         91 responses   
  [Canceled]   1 responses    

Error distribution:
  [1]   rpc error: code = Canceled desc = grpc: the client connection is closing

%%bash
ghz \
    --insecure \
    --proto ../../../proto/prediction.proto \
    --call seldon.protos.Seldon/Predict \
    --data-file=./data_tftensor.json \
    --qps=0 \
    --cpus=1 \
    --concurrency=1 \
    --duration="10s" \
    --format summary \
    --metadata='{"seldon": "seldon-model", "namespace": "seldon"}' \
    localhost:8003

Summary:
  Count:	425
  Total:	10.04 s
  Slowest:	16.38 ms
  Fastest:	3.97 ms
  Average:	5.33 ms
  Requests/sec:	42.31

Response time histogram:
  3.970 [1]	|
  5.211 [281]	|∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  6.452 [91]	|∎∎∎∎∎∎∎∎∎∎∎∎∎
  7.692 [25]	|∎∎∎∎
  8.933 [8]	|∎
  10.174 [6]	|∎
  11.415 [7]	|∎
  12.656 [2]	|
  13.896 [1]	|
  15.137 [1]	|
  16.378 [1]	|

Latency distribution:
  10 % in 4.34 ms 
  25 % in 4.54 ms 
  50 % in 4.89 ms 
  75 % in 5.52 ms 
  90 % in 6.79 ms 
  95 % in 8.30 ms 
  99 % in 11.71 ms 

Status code distribution:
  [OK]         424 responses   
  [Canceled]   1 responses     

Error distribution:
  [1]   rpc error: code = Canceled desc = grpc: the client connection is closing

Example results

ndarray

tensor

tftensor

253ms

8.4ms

5.5ms

Conclusions

gRPC is generally faster than REST except for ndarray which is much worse and should not be used with gRPC
tftensor is fastest

!kubectl delete -f model.yaml

seldondeployment.machinelearning.seldon.io "seldon-model" deleted

Test Predct Raw

%%writetemplate model.yaml
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
  name: seldon-model
  namespace: seldon
spec:
  predictors:
  - annotations:
      seldon.io/no-engine: "true"
    componentSpecs:
    - spec:
        containers:
        - image: seldonio/seldontest_predict_raw:{VERSION}
          imagePullPolicy: IfNotPresent
          name: classifier
          resources:
            requests:
              cpu: 1
            limits:
              cpu: 1
          env:
          - name: GUNICORN_WORKERS
            value: "1"
          - name: GUNICORN_THREADS
            value: "1"
        tolerations:
        - key: model
          operator: Exists
          effect: NoSchedule
    graph:
      children: []
      name: classifier
      type: MODEL
    name: default
    replicas: 1

!kubectl apply -f model.yaml

seldondeployment.machinelearning.seldon.io/seldon-model created

!kubectl wait --for condition=ready --timeout=600s pods --all -n seldon

pod/seldon-model-default-0-classifier-5dc8fbd597-kk7td condition met

Smoke test port-forward to check everything is working

!curl -X POST -H 'Content-Type: application/json' \
   -d '@./data_tftensor.json' \
    http://localhost:8003/seldon/seldon/seldon-model/api/v1.0/predictions

[1]

Test REST

ndarray
tensor
tftensor

This can be done locally as the results should be indicative of the relative differences rather than very accurate timings.

%%bash
vegeta attack -format=json -duration=10s -rate=0 -max-workers=1 -targets=vegeta_ndarray.json | 
  vegeta report -type=text

Requests      [total, rate, throughput]         724, 72.35, 72.25
Duration      [total, attack, wait]             10.021s, 10.007s, 14.458ms
Latencies     [min, mean, 50, 90, 95, 99, max]  12.228ms, 13.838ms, 13.683ms, 14.641ms, 15.489ms, 17.888ms, 22.263ms
Bytes In      [total, mean]                     2896, 4.00
Bytes Out     [total, mean]                     498774460, 688915.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:724  
Error Set:

%%bash
vegeta attack -format=json -duration=10s -rate=0 -max-workers=1 -targets=vegeta_tensor.json | 
  vegeta report -type=text

Requests      [total, rate, throughput]         724, 72.32, 72.22
Duration      [total, attack, wait]             10.025s, 10.011s, 14.307ms
Latencies     [min, mean, 50, 90, 95, 99, max]  12.362ms, 13.844ms, 13.701ms, 14.655ms, 15.493ms, 17.976ms, 18.802ms
Bytes In      [total, mean]                     2896, 4.00
Bytes Out     [total, mean]                     498792560, 688940.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:724  
Error Set:

%%bash
vegeta attack -format=json -duration=10s -rate=0 -max-workers=1 -targets=vegeta_tftensor.json | 
  vegeta report -type=text

Requests      [total, rate, throughput]         901, 90.04, 89.93
Duration      [total, attack, wait]             10.018s, 10.007s, 11.64ms
Latencies     [min, mean, 50, 90, 95, 99, max]  8.955ms, 11.116ms, 10.994ms, 12.099ms, 12.721ms, 15.208ms, 19.918ms
Bytes In      [total, mean]                     3604, 4.00
Bytes Out     [total, mean]                     961160671, 1066771.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:901  
Error Set:

Example results

ndarray

tensor

tftensor

13.3ms

11.1ms

Test gRPC

ndarray
tensor
tftensor

%%bash
ghz \
    --insecure \
    --proto ../../../proto/prediction.proto \
    --call seldon.protos.Seldon/Predict \
    --data-file=./data_ndarray.json \
    --qps=0 \
    --cpus=1 \
    --concurrency=1 \
    --duration="10s" \
    --format summary \
    --metadata='{"seldon": "seldon-model", "namespace": "seldon"}' \
    localhost:8003

Summary:
  Count:	44
  Total:	10.04 s
  Slowest:	69.07 ms
  Fastest:	44.44 ms
  Average:	46.03 ms
  Requests/sec:	4.38

Response time histogram:
  44.440 [1]	|∎
  46.904 [31]	|∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  49.367 [6]	|∎∎∎∎∎∎∎∎
  51.831 [2]	|∎∎∎
  54.294 [2]	|∎∎∎
  56.758 [0]	|
  59.221 [0]	|
  61.684 [0]	|
  64.148 [0]	|
  66.611 [0]	|
  69.075 [1]	|∎

Latency distribution:
  10 % in 45.05 ms 
  25 % in 45.40 ms 
  50 % in 46.30 ms 
  75 % in 47.34 ms 
  90 % in 50.16 ms 
  95 % in 53.38 ms 
  0 % in 0 ns 

Status code distribution:
  [OK]         43 responses   
  [Canceled]   1 responses    

Error distribution:
  [1]   rpc error: code = Canceled desc = grpc: the client connection is closing

%%bash
ghz \
    --insecure \
    --proto ../../../proto/prediction.proto \
    --call seldon.protos.Seldon/Predict \
    --data-file=./data_tensor.json \
    --qps=0 \
    --cpus=1 \
    --concurrency=1 \
    --duration="10s" \
    --format summary \
    --metadata='{"seldon": "seldon-model", "namespace": "seldon"}' \
    localhost:8003

Summary:
  Count:	92
  Total:	10.10 s
  Slowest:	19.81 ms
  Fastest:	4.93 ms
  Average:	7.91 ms
  Requests/sec:	9.11

Response time histogram:
  4.932 [1]	|∎
  6.419 [53]	|∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  7.907 [12]	|∎∎∎∎∎∎∎∎∎
  9.395 [5]	|∎∎∎∎
  10.882 [4]	|∎∎∎
  12.370 [1]	|∎
  13.858 [3]	|∎∎
  15.346 [3]	|∎∎
  16.833 [2]	|∎∎
  18.321 [3]	|∎∎
  19.809 [4]	|∎∎∎

Latency distribution:
  10 % in 5.21 ms 
  25 % in 5.68 ms 
  50 % in 6.04 ms 
  75 % in 8.27 ms 
  90 % in 15.77 ms 
  95 % in 19.04 ms 
  0 % in 0 ns 

Status code distribution:
  [OK]         91 responses   
  [Canceled]   1 responses    

Error distribution:
  [1]   rpc error: code = Canceled desc = grpc: the client connection is closing

%%bash
ghz \
    --insecure \
    --proto ../../../proto/prediction.proto \
    --call seldon.protos.Seldon/Predict \
    --data-file=./data_tftensor.json \
    --qps=0 \
    --cpus=1 \
    --concurrency=1 \
    --duration="10s" \
    --format summary \
    --metadata='{"seldon": "seldon-model", "namespace": "seldon"}' \
    localhost:8003

Summary:
  Count:	426
  Total:	10.03 s
  Slowest:	11.74 ms
  Fastest:	3.67 ms
  Average:	5.02 ms
  Requests/sec:	42.48

Response time histogram:
  3.668 [1]	|
  4.475 [174]	|∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  5.282 [141]	|∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  6.089 [43]	|∎∎∎∎∎∎∎∎∎∎
  6.897 [30]	|∎∎∎∎∎∎∎
  7.704 [16]	|∎∎∎∎
  8.511 [6]	|∎
  9.318 [8]	|∎∎
  10.126 [2]	|
  10.933 [1]	|
  11.740 [3]	|∎

Latency distribution:
  10 % in 4.08 ms 
  25 % in 4.27 ms 
  50 % in 4.61 ms 
  75 % in 5.30 ms 
  90 % in 6.62 ms 
  95 % in 7.66 ms 
  99 % in 10.26 ms 

Status code distribution:
  [OK]         425 responses   
  [Canceled]   1 responses     

Error distribution:
  [1]   rpc error: code = Canceled desc = grpc: the client connection is closing

Example results

ndarray

tensor

tftensor

46ms

7.9ms

5.0ms

Conclusions

predict_raw is faster than predict but you will need to handle the serialization/deserializtion yourself which maybe will make them equivalent unless specific techniques can be applied for your use case.

Test with Predict method on Small Batch Size

%%writetemplate model.yaml
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
  name: seldon-model
  namespace: seldon
spec:
  predictors:
  - annotations:
      seldon.io/no-engine: "true"
    componentSpecs:
    - spec:
        containers:
        - image: seldonio/seldontest_predict:{VERSION}
          imagePullPolicy: IfNotPresent
          name: classifier
          resources:
            requests:
              cpu: 1
            limits:
              cpu: 1
          env:
          - name: GUNICORN_WORKERS
            value: "1"
          - name: GUNICORN_THREADS
            value: "1"
        tolerations:
        - key: model
          operator: Exists
          effect: NoSchedule
    graph:
      children: []
      name: classifier
      type: MODEL
    name: default
    replicas: 1

!kubectl apply -f model.yaml

seldondeployment.machinelearning.seldon.io/seldon-model configured

!kubectl wait --for condition=ready --timeout=600s pods --all -n seldon

pod/seldon-model-default-0-classifier-5445bd4ccf-bgkcm condition met

Create payloads and associated vegeta configurations for

ndarray
tensor
tftensor

We will create an array of 100,000 consecutive integers.

import json

sz = 1
vals = list(range(sz))
valStr = f"{vals}"
payload = '{"data": {"ndarray": [' + valStr + "]}}"
with open("data_ndarray.json", "w") as f:
    f.write(payload)
payload_tensor = (
    '{"data":{"tensor":{"shape":[1,' + str(sz) + '],"values":' + valStr + "}}}"
)
with open("data_tensor.json", "w") as f:
    f.write(payload_tensor)

import numpy as np
import tensorflow as tf
from google.protobuf import json_format

array = np.array(vals)
tftensor = tf.make_tensor_proto(array)
jStrTensor = json_format.MessageToJson(tftensor)
jTensor = json.loads(jStrTensor)
payload_tftensor = (
    '{"data":{"tftensor":' + json.dumps(jTensor, separators=(",", ":")) + "}}"
)
with open("data_tftensor.json", "w") as f:
    f.write(payload_tftensor)

import base64
import json

sample_string_bytes = payload_tensor.encode("ascii")
base64_bytes = base64.b64encode(sample_string_bytes)
base64_string = base64_bytes.decode("ascii")
jqPayload = {
    "method": "POST",
    "url": "http://localhost:8003/seldon/seldon/seldon-model/api/v1.0/predictions",
    "body": base64_string,
    "header": {"Content-Type": ["application/json"]},
}
with open("vegeta_tensor.json", "w") as f:
    f.write(json.dumps(jqPayload, separators=(",", ":")))
    f.write("\n")

sample_string_bytes = payload.encode("ascii")
base64_bytes = base64.b64encode(sample_string_bytes)
base64_string = base64_bytes.decode("ascii")
jqPayload = {
    "method": "POST",
    "url": "http://localhost:8003/seldon/seldon/seldon-model/api/v1.0/predictions",
    "body": base64_string,
    "header": {"Content-Type": ["application/json"]},
}
with open("vegeta_ndarray.json", "w") as f:
    f.write(json.dumps(jqPayload, separators=(",", ":")))
    f.write("\n")


sample_string_bytes = payload_tftensor.encode("ascii")
base64_bytes = base64.b64encode(sample_string_bytes)
base64_string = base64_bytes.decode("ascii")
jqPayload = {
    "method": "POST",
    "url": "http://localhost:8003/seldon/seldon/seldon-model/api/v1.0/predictions",
    "body": base64_string,
    "header": {"Content-Type": ["application/json"]},
}
with open("vegeta_tftensor.json", "w") as f:
    f.write(json.dumps(jqPayload, separators=(",", ":")))
    f.write("\n")

Smoke test port-forward to check everything is working

!curl -X POST -H 'Content-Type: application/json' \
   -d '@./data_tensor.json' \
    http://localhost:8003/seldon/seldon/seldon-model/api/v1.0/predictions

{"data":{"names":[],"tensor":{"shape":[1],"values":[1]}},"meta":{"requestPath":{"classifier":"seldonio/seldontest_predict:1.10.0-dev"}}}

Test REST

ndarray
tensor
tftensor

This can be done locally as the results should be indicative of the relative differences rather than very accurate timings.

%%bash
vegeta attack -format=json -duration=10s -rate=0 -max-workers=1 -targets=vegeta_ndarray.json | 
  vegeta report -type=text

Requests      [total, rate, throughput]         5538, 553.80, 553.67
Duration      [total, attack, wait]             10.002s, 10s, 2.364ms
Latencies     [min, mean, 50, 90, 95, 99, max]  1.569ms, 1.804ms, 1.739ms, 1.984ms, 2.198ms, 2.861ms, 6.62ms
Bytes In      [total, mean]                     636870, 115.00
Bytes Out     [total, mean]                     155064, 28.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:5538  
Error Set:

%%bash
vegeta attack -format=json -duration=10s -rate=0 -max-workers=1 -targets=vegeta_tensor.json | 
  vegeta report -type=text

Requests      [total, rate, throughput]         5557, 555.65, 555.55
Duration      [total, attack, wait]             10.003s, 10.001s, 1.753ms
Latencies     [min, mean, 50, 90, 95, 99, max]  1.578ms, 1.798ms, 1.74ms, 1.925ms, 2.119ms, 2.981ms, 5.968ms
Bytes In      [total, mean]                     761309, 137.00
Bytes Out     [total, mean]                     266736, 48.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:5557  
Error Set:

%%bash
vegeta attack -format=json -duration=10s -rate=0 -max-workers=1 -targets=vegeta_tftensor.json | 
  vegeta report -type=text

Requests      [total, rate, throughput]         4548, 454.75, 454.65
Duration      [total, attack, wait]             10.003s, 10.001s, 2.141ms
Latencies     [min, mean, 50, 90, 95, 99, max]  1.937ms, 2.197ms, 2.138ms, 2.351ms, 2.482ms, 3.215ms, 9.424ms
Bytes In      [total, mean]                     850476, 187.00
Bytes Out     [total, mean]                     436608, 96.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:4548  
Error Set:

Example results

ndarray

tensor

tftensor

1.8ms

2.1ms

Test gRPC

ndarray
tensor
tftensor

%%bash
ghz \
    --insecure \
    --proto ../../../proto/prediction.proto \
    --call seldon.protos.Seldon/Predict \
    --data-file=./data_ndarray.json \
    --qps=0 \
    --cpus=1 \
    --concurrency=1 \
    --duration="10s" \
    --format summary \
    --metadata='{"seldon": "seldon-model", "namespace": "seldon"}' \
    localhost:8003

Summary:
  Count:	6506
  Total:	10.01 s
  Slowest:	18.58 ms
  Fastest:	1.26 ms
  Average:	1.46 ms
  Requests/sec:	650.23

Response time histogram:
  1.260 [1]	|
  2.992 [6465]	|∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  4.724 [30]	|
  6.456 [5]	|
  8.187 [2]	|
  9.919 [1]	|
  11.651 [0]	|
  13.382 [0]	|
  15.114 [0]	|
  16.846 [0]	|
  18.578 [1]	|

Latency distribution:
  10 % in 1.33 ms 
  25 % in 1.36 ms 
  50 % in 1.39 ms 
  75 % in 1.45 ms 
  90 % in 1.58 ms 
  95 % in 1.79 ms 
  99 % in 2.50 ms 

Status code distribution:
  [OK]            6505 responses   
  [Unavailable]   1 responses      

Error distribution:
  [1]   rpc error: code = Unavailable desc = transport is closing

%%bash
ghz \
    --insecure \
    --proto ../../../proto/prediction.proto \
    --call seldon.protos.Seldon/Predict \
    --data-file=./data_tensor.json \
    --qps=0 \
    --cpus=1 \
    --concurrency=1 \
    --duration="10s" \
    --format summary \
    --metadata='{"seldon": "seldon-model", "namespace": "seldon"}' \
    localhost:8003

Summary:
  Count:	6429
  Total:	10.01 s
  Slowest:	16.30 ms
  Fastest:	1.29 ms
  Average:	1.49 ms
  Requests/sec:	642.56

Response time histogram:
  1.287 [1]	|
  2.789 [6375]	|∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  4.290 [36]	|
  5.792 [11]	|
  7.293 [2]	|
  8.795 [1]	|
  10.296 [0]	|
  11.798 [1]	|
  13.299 [0]	|
  14.801 [0]	|
  16.303 [1]	|

Latency distribution:
  10 % in 1.36 ms 
  25 % in 1.38 ms 
  50 % in 1.42 ms 
  75 % in 1.48 ms 
  90 % in 1.60 ms 
  95 % in 1.80 ms 
  99 % in 2.67 ms 

Status code distribution:
  [OK]            6428 responses   
  [Unavailable]   1 responses      

Error distribution:
  [1]   rpc error: code = Unavailable desc = transport is closing

%%bash
ghz \
    --insecure \
    --proto ../../../proto/prediction.proto \
    --call seldon.protos.Seldon/Predict \
    --data-file=./data_tftensor.json \
    --qps=0 \
    --cpus=1 \
    --concurrency=1 \
    --duration="10s" \
    --format summary \
    --metadata='{"seldon": "seldon-model", "namespace": "seldon"}' \
    localhost:8003

Summary:
  Count:	6066
  Total:	10.01 s
  Slowest:	9.38 ms
  Fastest:	1.39 ms
  Average:	1.57 ms
  Requests/sec:	606.20

Response time histogram:
  1.387 [1]	|
  2.187 [5945]	|∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  2.986 [84]	|∎
  3.785 [20]	|
  4.585 [7]	|
  5.384 [2]	|
  6.183 [4]	|
  6.983 [0]	|
  7.782 [0]	|
  8.582 [1]	|
  9.381 [1]	|

Latency distribution:
  10 % in 1.46 ms 
  25 % in 1.48 ms 
  50 % in 1.52 ms 
  75 % in 1.57 ms 
  90 % in 1.66 ms 
  95 % in 1.81 ms 
  99 % in 2.61 ms 

Status code distribution:
  [OK]            6065 responses   
  [Unavailable]   1 responses      

Error distribution:
  [1]   rpc error: code = Unavailable desc = transport is closing

Example results

ndarray

tensor

tftensor

1.46ms

1.49ms

1.57ms

Conclusions

gRPC is generally faster than REST
There is very little difference between payload types with simpler tensor/ndarray probably being slightly faster

!kubectl delete -f model.yaml

seldondeployment.machinelearning.seldon.io "seldon-model" deleted

PreviousArgo Workflows Benchmarking NextKMP_AFFINITY Benchmarking Example

Last updated 8 months ago

Was this helpful?

hashtagPrequisites

hashtagTLDR

hashtagTest with Predict method on Large Batch Size

hashtagConclusions

hashtagTest Predct Raw

hashtagConclusions

hashtagTest with Predict method on Small Batch Size

hashtagConclusions

Prequisites

TLDR

Test with Predict method on Large Batch Size

Conclusions

Test Predct Raw

Conclusions

Test with Predict method on Small Batch Size

Conclusions