simpler tensor/ndarray is better for small batch size
Test with Predict method on Large Batch Size
The seldontest_predict has simply a predict method that does a loop with a configurable number of iterations (default 1) to simulate work. The iterations can be set as a Seldon parameter but in this case we are looking to benchmark the serialization/deserialization cost so want a minimal amount of work.
Create payloads and associated vegeta configurations for
ndarray
tensor
tftensor
We will create an array of 100,000 consecutive integers.
Smoke test port-forward to check everything is working
Test REST
ndarray
tensor
tftensor
This can be done locally as the results should be indicative of the relative differences rather than very accurate timings.
Example results
ndarray
tensor
tftensor
19.8ms
19.7ms
16.2ms
Test gRPC
ndarray
tensor
tftensor
Example results
ndarray
tensor
tftensor
253ms
8.4ms
5.5ms
Conclusions
gRPC is generally faster than REST except for ndarray which is much worse and should not be used with gRPC
tftensor is fastest
Test Predct Raw
Smoke test port-forward to check everything is working
Test REST
ndarray
tensor
tftensor
This can be done locally as the results should be indicative of the relative differences rather than very accurate timings.
Example results
ndarray
tensor
tftensor
13.3ms
13.3ms
11.1ms
Test gRPC
ndarray
tensor
tftensor
Example results
ndarray
tensor
tftensor
46ms
7.9ms
5.0ms
Conclusions
predict_raw is faster than predict but you will need to handle the serialization/deserializtion yourself which maybe will make them equivalent unless specific techniques can be applied for your use case.
Test with Predict method on Small Batch Size
The seldontest_predict has simply a predict method that does a loop with a configurable number of iterations (default 1) to simulate work. The iterations can be set as a Seldon parameter but in this case we are looking to benchmark the serialization/deserialization cost so want a minimal amount of work.
Create payloads and associated vegeta configurations for
ndarray
tensor
tftensor
We will create an array of 100,000 consecutive integers.
Smoke test port-forward to check everything is working
Test REST
ndarray
tensor
tftensor
This can be done locally as the results should be indicative of the relative differences rather than very accurate timings.
Example results
ndarray
tensor
tftensor
1.8ms
1.8ms
2.1ms
Test gRPC
ndarray
tensor
tftensor
Example results
ndarray
tensor
tftensor
1.46ms
1.49ms
1.57ms
Conclusions
gRPC is generally faster than REST
There is very little difference between payload types with simpler tensor/ndarray probably being slightly faster
from IPython.core.magic import register_line_cell_magic
@register_line_cell_magic
def writetemplate(line, cell):
with open(line, "w") as f:
f.write(cell.format(**globals()))
VERSION = !cat ../../../version.txt
VERSION = VERSION[0]
VERSION
'1.10.0-dev'
!kubectl create namespace seldon
Error from server (AlreadyExists): namespaces "seldon" already exists