Kafka Payload Logging
This notebook illustrates testing your model with Kafka payload logging.
Prequisites
An authenticated K8S cluster with istio and Seldon Core installed
You can use the ansible seldon-core and kafka playbooks in the root ansible folder.
vegeta and ghz benchmarking tools
Port forward to istio
kubectl port-forward $(kubectl get pods -l istio=ingressgateway -n istio-system -o jsonpath='{.items[0].metadata.name}') -n istio-system 8003:8080
Tested on GKE with 6 nodes of 32vCPU e2-standard-32
from IPython.core.magic import register_line_cell_magic
@register_line_cell_magic
def writetemplate(line, cell):
with open(line, "w") as f:
f.write(cell.format(**globals()))
VERSION = !cat ../../../version.txt
VERSION = VERSION[0]
VERSION
!kubectl create namespace seldon
CIFAR10 Model running on Triton Inference Server
We run CIFAR10 image model on Triton inference server with settings to allow 5 CPUs to be used for model on Triton.
%%writetemplate model.yaml
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
name: cifar10
namespace: seldon
spec:
name: resnet32
predictors:
- componentSpecs:
- spec:
containers:
- name: cifar10
resources:
requests:
cpu: 5
limits:
cpu: 5
graph:
implementation: TRITON_SERVER
logger:
mode: all
modelUri: gs://seldon-models/triton/tf_cifar10_5cpu
name: cifar10
name: default
svcOrchSpec:
env:
- name: LOGGER_KAFKA_BROKER
value: seldon-kafka-plain-0.kafka:9092
- name: LOGGER_KAFKA_TOPIC
value: seldon
- name: GOMAXPROCS
value: "2"
resources:
requests:
memory: "3G"
cpu: 2
limits:
memory: "3G"
cpu: 2
replicas: 15
protocol: kfserving
!kubectl apply -f model.yaml -n seldon
!kubectl wait --for condition=ready --timeout=600s pods --all -n seldon
!curl -X POST -H 'Content-Type: application/json' \
-d '@./truck-v2.json' \
http://localhost:8003/seldon/seldon/cifar10/v2/models/cifar10/infer
Direct Tests to Validate Setup
%%bash
vegeta attack -format=json -duration=10s -rate=0 -max-workers=1 -targets=vegeta_cifar10.json |
vegeta report -type=text
Run Vegeta Benchmark
!kubectl create -f configmap_cifar10.yaml -n seldon
workers = 10
duration = "300s"
%%writetemplate job-vegeta-cifar10.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: cifar10-loadtest
spec:
backoffLimit: 6
parallelism: 16
template:
metadata:
annotations:
sidecar.istio.io/inject: "false"
spec:
containers:
- args:
- vegeta -cpus=1 attack -format=json -keepalive=false -duration={duration} -rate=0 -max-workers={workers} -targets=/var/vegeta/cifar10.json
| vegeta report -type=text
command:
- sh
- -c
image: peterevans/vegeta:latest
imagePullPolicy: Always
name: vegeta
volumeMounts:
- mountPath: /var/vegeta
name: vegeta-cfg
restartPolicy: Never
volumes:
- configMap:
defaultMode: 420
name: vegeta-cfg
name: vegeta-cfg
!kubectl create -f job-vegeta-cifar10.yaml -n seldon
!kubectl wait --for=condition=complete job/cifar10-loadtest -n seldon
!kubectl delete -f job-vegeta-cifar10.yaml -n seldon
!kubectl delete -f model.yaml
Summary
By looking at the Kafka Grafana monitoring on e can inspect the achieved message rate.
You can port-forward to it with:
kubectl port-forward svc/kafka-grafana -n kafka 3000:80
The default login and password is set to admin
.
On the above deployment and test we see around 3K predictions per second resulting in 6K Kafka messages per second.
Last updated
Was this helpful?