1 of 6

Data Science Monitoring

Dataflow with Kafka

Explore how Seldon Core 2 uses data flow paradigm and Kafka-based streaming to improve ML model deployment with better scalability, fault tolerance, and data observability.

Seldon Core 2 is designed around data flow paradigm. Here we will explain what that means and some of the rationals behind this choice.

Seldon Core v1

Initial release of Seldon Core introduced a concept of an inference graph, which can be thought of as a sequence of operations that happen to the inference request. Here is how it may look like:

In reality though this was not how Seldon Core v1 is implemented. Instead, Seldon deployment consists of a range of independent services that host models, transformations, detectors and explainers, and a central orchestrator that knows the inference graph topology and makes service calls in the correct order, passing data between requests and responses as necessary. Here is how the picture looks under the hood:

While this is a convenient way of implementing evaluation graph with microservices, it has a few problems. Orchestrator becomes a bottleneck and a single point of failure. It also hides all the data transformations that need to happen to translate one service's response to another service's request. Data tracing and lineage becomes difficult. All in all, while Seldon platform is all about processing data, under-the-hood implementation was still focused on order of operations and not on data itself.

Data flow

The realisation of this disparity led to a new approach towards inference graph evaluation in v2, based on the data flow paradigm. Data flow is a well known concept in software engineering, known from 1960s. In contrast to services, that model programs as a control flow, focusing on the order of operations, data flow proposes to model software systems as a series of connections that modify incoming data, focusing on data flowing through the system. A particular flavor of data flow paradigm used by v2 is known as flow-based programming, FBP. FBP defines software applications as a set of processes which exchange data via connections that are external to those processes. Connections are made via named ports, which promotes data coupling between components of the system.

Data flow design makes data in software the top priority. That is one of the key messages of the so called "data-centric AI" idea, which is becoming increasingly popular within the ML community. Data is a key component of a successful ML project. Data needs to be discovered, described, cleaned, understood, monitored and verified. Consequently, there is a growing demand for data-centric platforms and solutions. Making Seldon Core data-centric was one of the key goals of the Seldon Core 2 design.

Seldon Core 2

In the context of Seldon Core application of FBP design approach means that the evaluation implementation is done the same way inferece graph. So instead of routing everything through a centralized orchestrator the evaluation happens in the same graph-like manner:

As far as implementation goes, Seldon Core 2 runs on Kafka. Inference request is put onto a pipeline input topic, which triggers an evaluation. Each part of the inference graph is a service running in its own container fronted by a model gateway. Model gateway listens to a corresponding input Kafka topic, reads data from it, calls the service and puts the received response to an output Kafka topic. There is also a pipeline gateway that allows to interact with Seldon Core in synchronous manner.

This approach gives SCv2 several important features. Firstly, Seldon Core natively supports both synchronous and asynchronous modes of operation. Asynchronicity is achieved via streaming: input data can be sent to an input topic in Kafka, and after the evaluation the output topic will contain the inference result. For those looking to use it in the v1 style, a service API is provided.

Secondly, there is no single point of failure. Even if one or more nodes in the graph go down, the data will still be sitting on the streams waiting to be processed, and the evaluation resumes whenever the failed node comes back up.

Thirdly, data flow means intermediate data can be accessed at any arbitrary step of the graph, inspected and collected as necessary. Data lineage is possible throughout, which opens up opportunities for advanced monitoring and explainability use cases. This is a key feature for effective error surfacing in production environments as it allows:

Adding context from different parts of the graph to better understand a particular output
Reducing false positive rates of alerts as different slices of the data can be investigated
Enabling reproducing of results as fined-grained lineage of computation and associated data transformation are tracked by design

Finally, inference graph can now be extended with adding new nodes at arbitrary places, all without affecting the pipeline execution. This kind of flexibility was not possible with v1. This also allows multiple pipelines to share common nodes and therefore optimising resources usage.

References

More details and information on data-centric AI and data flow paradigm can be found in these resources:

Stanford MLSys seminar
A paper that explores
from its creator J.P. Morrison:

Model Performance Metrics

Learn how to run performance tests for Seldon Core 2 deployments, including load testing, benchmarking, and analyzing inference latency and throughput metrics.

This section describes how a user can run performance tests to understand the limits of a particular SCv2 deployment.

The base directly is tests/k6

Driver

k6 is used to drive requests for load, unload and infer workloads. It is recommended that the load test is run within the same cluster that has SCv2 installed as it requires internal access to some of the services that are not automatically exposed to the outside world. Furthermore having the driver withthin the same cluster minimises link latency to SCv2 entrypoint; therefore infer latencies are more representatives of actual overheads of the system.

Tests

Envoy Tests synchronous inference requests via envoy

To run: make deploy-envoy-test

Agent Tests inference requests direct to a specific agent, defaults to triton-0 or mlserver-0

To run: make deploy-rproxy-test pr make deploy-rproxy-mlserver-test

Server Tests inference requests direct to a specific server (bypassing agent), defaults to triton-0 or mlserver-0

to run: make deploy-server-test or deploy-server-mlserver-test

Pipeline gateway (HTTP-Kafka gateway) Tests inference requests to one-node pipeline HTTP and GPRC requests

To run: make deploy-kpipeline-test

Model gateway (Kafka-HTTP gateway) Tests inference requests to a model via kafka

To run: deploy-kmodel-test

Results

One way to look at results is to look at the log of the pod that executed the kubernetes job.

Results can also be persisted to a gs bucket, a service account k6-sa-key in the same namespace is required,

Users can also look at the metrics that are exposed in prometheus while the test is underway

Building k6 image

In the case a user is modifying the actual scenario of the test:

export DOCKERHUB_USERNAME=mydockerhubaccount
build the k6 image via make build-push
in the same shell environment, deploying jobs will use this custome built docker image

Modifying tests

Users can modify settings of the tests in tests/k6/configs/k8s/base/k6.yaml. This will apply to all subsequent tests that are deployed using the above process.

Settings

Some settings that can be changed

k6 args

for a full list, check

Environment variables
- for MODEL_TYPE, choose from:

Drift Detection

Learn how to implement drift detection in Seldon Core 2 using Alibi-Detect integration for model monitoring and batch processing.

Drift detection models are treated as any other Model. You can run any saved Alibi-Detect drift detection model by adding the requirement alibi-detect.

An example drift detection model from the CIFAR10 image classification example is shown below:

# samples/models/cifar10-drift-detect.yaml
apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: cifar10-drift
spec:
  storageUri: "gs://seldon-models/scv2/examples/mlserver_1.3.5/cifar10/drift-detector"
  requirements:
    - mlserver
    - alibi-detect

Usually you would run these models in an asynchronous part of a Pipeline, i.e. they are not connected to the output of the Pipeline which defines the synchronous path. For example, the CIFAR-10 image detection example uses a pipeline as shown below:

# samples/pipelines/cifar10.yaml
apiVersion: mlops.seldon.io/v1alpha1
kind: Pipeline
metadata:
  name: cifar10-production
spec:
  steps:
    - name: cifar10
    - name: cifar10-outlier
    - name: cifar10-drift
      batch:
        size: 20
  output:
    steps:
    - cifar10
    - cifar10-outlier.outputs.is_outlier

Note how the cifar10-drift model is not part of the path to the outputs. Drift alerts can be read from the Kafka topic of the model.

Examples

Outlier Detection

Learn how to implement outlier detection in Seldon Core using Alibi-Detect integration for model monitoring and anomaly detection.

Outlier detection models are treated as any other Model. You can run any saved Alibi-Detect outlier detection model by adding the requirement alibi-detect.

An example outlier detection model from the CIFAR10 image classification example is shown below:

# samples/models/cifar10-outlier-detect.yaml
apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: cifar10-outlier
spec:
  storageUri: "gs://seldon-models/scv2/examples/mlserver_1.3.5/cifar10/outlier-detector"
  requirements:
    - mlserver
    - alibi-detect

Examples

Explainability

Learn how to implement model explainability in Seldon Core using Alibi-Explain integration for black box model explanations and pipeline insights.

Explainers are Model resources with some extra settings. They allow a range of explainers from the Alibi-Explain library to be run on MLServer.

An example Anchors explainer definitions is shown below.

The key additions are:

type: This must be one of the supported Alibi Explainer types supported by the Alibi Explain runtime in MLServer.
modelRef: The model name for black box explainers.
pipelineRef: The pipeline name for black box explainers.

Only one of modelRef and pipelineRef is allowed.

Pipeline Explanations

Blackbox explainers can explain a Pipeline as well as a model. An example from the is show below.

Examples

Dataflow with Kafka

Explore how Seldon Core 2 uses data flow paradigm and Kafka-based streaming to improve ML model deployment with better scalability, fault tolerance, and data observability.

Seldon Core 2 is designed around data flow paradigm. Here we will explain what that means and some of the rationals behind this choice.

Seldon Core v1

Initial release of Seldon Core introduced a concept of an inference graph, which can be thought of as a sequence of operations that happen to the inference request. Here is how it may look like:

Data flow

Seldon Core 2

Adding context from different parts of the graph to better understand a particular output
Reducing false positive rates of alerts as different slices of the data can be investigated
Enabling reproducing of results as fined-grained lineage of computation and associated data transformation are tracked by design

References

More details and information on data-centric AI and data flow paradigm can be found in these resources:

Stanford MLSys seminar
A paper that explores
from its creator J.P. Morrison:

Model Performance Metrics

Learn how to run performance tests for Seldon Core 2 deployments, including load testing, benchmarking, and analyzing inference latency and throughput metrics.

This section describes how a user can run performance tests to understand the limits of a particular SCv2 deployment.

The base directly is tests/k6

Driver

Tests

Envoy Tests synchronous inference requests via envoy

To run: make deploy-envoy-test

Agent Tests inference requests direct to a specific agent, defaults to triton-0 or mlserver-0

To run: make deploy-rproxy-test pr make deploy-rproxy-mlserver-test

Server Tests inference requests direct to a specific server (bypassing agent), defaults to triton-0 or mlserver-0

to run: make deploy-server-test or deploy-server-mlserver-test

Pipeline gateway (HTTP-Kafka gateway) Tests inference requests to one-node pipeline HTTP and GPRC requests

To run: make deploy-kpipeline-test

Model gateway (Kafka-HTTP gateway) Tests inference requests to a model via kafka

To run: deploy-kmodel-test

Results

One way to look at results is to look at the log of the pod that executed the kubernetes job.

Results can also be persisted to a gs bucket, a service account k6-sa-key in the same namespace is required,

Users can also look at the metrics that are exposed in prometheus while the test is underway

Building k6 image

In the case a user is modifying the actual scenario of the test:

export DOCKERHUB_USERNAME=mydockerhubaccount
build the k6 image via make build-push
in the same shell environment, deploying jobs will use this custome built docker image

Modifying tests

Users can modify settings of the tests in tests/k6/configs/k8s/base/k6.yaml. This will apply to all subsequent tests that are deployed using the above process.

Settings

Some settings that can be changed

k6 args

for a full list, check

Environment variables
- for MODEL_TYPE, choose from: