Python Serialization Cost Benchmark

Prequisites

  • An authenticated K8S cluster with istio and Seldon Core installed

    • You can use the ansible seldon-core playbook at https://github.com/SeldonIO/ansible-k8s-collection

  • vegeta and ghz benchmarking tools

Port forward to istio

kubectl port-forward $(kubectl get pods -l istio=ingressgateway -n istio-system -o jsonpath='{.items[0].metadata.name}') -n istio-system 8003:8080

Tests

  • Large Batch Size

    • predict method with:

      • REST

        • ndarray

        • tensor

        • tftensor

      • gRPC

        • ndarray

        • tensor

        • tftensor

    • predict_raw method with:

      • REST

        • ndarray

        • tensor

        • tftensor

      • gRPC

        • ndarray

        • tensor

        • tftensor

  • Small Batch Size

    • predict method with:

      • REST

        • ndarray

        • tensor

        • tftensor

      • gRPC

        • ndarray

        • tensor

        • tftensor

TLDR

  • gRPC is faster than REST

  • tftensor is best for large batch size

  • ndarray with gRPC is bad for large batch size

  • simpler tensor/ndarray is better for small batch size

Test with Predict method on Large Batch Size

The seldontest_predict has simply a predict method that does a loop with a configurable number of iterations (default 1) to simulate work. The iterations can be set as a Seldon parameter but in this case we are looking to benchmark the serialization/deserialization cost so want a minimal amount of work.

Create payloads and associated vegeta configurations for

  1. ndarray

  2. tensor

  3. tftensor

We will create an array of 100,000 consecutive integers.

Smoke test port-forward to check everything is working

Test REST

  1. ndarray

  2. tensor

  3. tftensor

This can be done locally as the results should be indicative of the relative differences rather than very accurate timings.

Example results

ndarray
tensor
tftensor

19.8ms

19.7ms

16.2ms

Test gRPC

  1. ndarray

  2. tensor

  3. tftensor

Example results

ndarray
tensor
tftensor

253ms

8.4ms

5.5ms

Conclusions

  • gRPC is generally faster than REST except for ndarray which is much worse and should not be used with gRPC

  • tftensor is fastest

Test Predct Raw

Smoke test port-forward to check everything is working

Test REST

  1. ndarray

  2. tensor

  3. tftensor

This can be done locally as the results should be indicative of the relative differences rather than very accurate timings.

Example results

ndarray
tensor
tftensor

13.3ms

13.3ms

11.1ms

Test gRPC

  1. ndarray

  2. tensor

  3. tftensor

Example results

ndarray
tensor
tftensor

46ms

7.9ms

5.0ms

Conclusions

  • predict_raw is faster than predict but you will need to handle the serialization/deserializtion yourself which maybe will make them equivalent unless specific techniques can be applied for your use case.

Test with Predict method on Small Batch Size

The seldontest_predict has simply a predict method that does a loop with a configurable number of iterations (default 1) to simulate work. The iterations can be set as a Seldon parameter but in this case we are looking to benchmark the serialization/deserialization cost so want a minimal amount of work.

Create payloads and associated vegeta configurations for

  1. ndarray

  2. tensor

  3. tftensor

We will create an array of 100,000 consecutive integers.

Smoke test port-forward to check everything is working

Test REST

  1. ndarray

  2. tensor

  3. tftensor

This can be done locally as the results should be indicative of the relative differences rather than very accurate timings.

Example results

ndarray
tensor
tftensor

1.8ms

1.8ms

2.1ms

Test gRPC

  1. ndarray

  2. tensor

  3. tftensor

Example results

ndarray
tensor
tftensor

1.46ms

1.49ms

1.57ms

Conclusions

  • gRPC is generally faster than REST

  • There is very little difference between payload types with simpler tensor/ndarray probably being slightly faster

Last updated

Was this helpful?