All pages
Powered by GitBook
1 of 6

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Data Science Monitoring

Dataflow with Kafka

Explore how Seldon Core 2 uses data flow paradigm and Kafka-based streaming to improve ML model deployment with better scalability, fault tolerance, and data observability.

Seldon Core 2 is designed around data flow paradigm. Here we will explain what that means and some of the rationals behind this choice.

Seldon Core v1

Initial release of Seldon Core introduced a concept of an inference graph, which can be thought of as a sequence of operations that happen to the inference request. Here is how it may look like:

inference_graph

In reality though this was not how Seldon Core v1 is implemented. Instead, Seldon deployment consists of a range of independent services that host models, transformations, detectors and explainers, and a central orchestrator that knows the inference graph topology and makes service calls in the correct order, passing data between requests and responses as necessary. Here is how the picture looks under the hood:

While this is a convenient way of implementing evaluation graph with microservices, it has a few problems. Orchestrator becomes a bottleneck and a single point of failure. It also hides all the data transformations that need to happen to translate one service's response to another service's request. Data tracing and lineage becomes difficult. All in all, while Seldon platform is all about processing data, under-the-hood implementation was still focused on order of operations and not on data itself.

Data flow

The realisation of this disparity led to a new approach towards inference graph evaluation in v2, based on the data flow paradigm. Data flow is a well known concept in software engineering, known from 1960s. In contrast to services, that model programs as a control flow, focusing on the order of operations, data flow proposes to model software systems as a series of connections that modify incoming data, focusing on data flowing through the system. A particular flavor of data flow paradigm used by v2 is known as flow-based programming, FBP. FBP defines software applications as a set of processes which exchange data via connections that are external to those processes. Connections are made via named ports, which promotes data coupling between components of the system.

Data flow design makes data in software the top priority. That is one of the key messages of the so called "data-centric AI" idea, which is becoming increasingly popular within the ML community. Data is a key component of a successful ML project. Data needs to be discovered, described, cleaned, understood, monitored and verified. Consequently, there is a growing demand for data-centric platforms and solutions. Making Seldon Core data-centric was one of the key goals of the Seldon Core 2 design.

Seldon Core 2

In the context of Seldon Core application of FBP design approach means that the evaluation implementation is done the same way inferece graph. So instead of routing everything through a centralized orchestrator the evaluation happens in the same graph-like manner:

As far as implementation goes, Seldon Core 2 runs on Kafka. Inference request is put onto a pipeline input topic, which triggers an evaluation. Each part of the inference graph is a service running in its own container fronted by a model gateway. Model gateway listens to a corresponding input Kafka topic, reads data from it, calls the service and puts the received response to an output Kafka topic. There is also a pipeline gateway that allows to interact with Seldon Core in synchronous manner.

This approach gives SCv2 several important features. Firstly, Seldon Core natively supports both synchronous and asynchronous modes of operation. Asynchronicity is achieved via streaming: input data can be sent to an input topic in Kafka, and after the evaluation the output topic will contain the inference result. For those looking to use it in the v1 style, a service API is provided.

Secondly, there is no single point of failure. Even if one or more nodes in the graph go down, the data will still be sitting on the streams waiting to be processed, and the evaluation resumes whenever the failed node comes back up.

Thirdly, data flow means intermediate data can be accessed at any arbitrary step of the graph, inspected and collected as necessary. Data lineage is possible throughout, which opens up opportunities for advanced monitoring and explainability use cases. This is a key feature for effective error surfacing in production environments as it allows:

  • Adding context from different parts of the graph to better understand a particular output

  • Reducing false positive rates of alerts as different slices of the data can be investigated

  • Enabling reproducing of results as fined-grained lineage of computation and associated data transformation are tracked by design

Finally, inference graph can now be extended with adding new nodes at arbitrary places, all without affecting the pipeline execution. This kind of flexibility was not possible with v1. This also allows multiple pipelines to share common nodes and therefore optimising resources usage.

References

More details and information on data-centric AI and data flow paradigm can be found in these resources:

  • Stanford MLSys seminar

  • A paper that explores

  • from its creator J.P. Morrison:

Pathways: Asynchronous Distributed Dataflow for ML research workfrom Google on the design and implementation of data flow based orchestration layer for accelerators

  • Better understanding of data requires tracking its history and context

  • Data-centric AI Resource Hub
    "What can Data-Centric AI Learn from Data and ML Engineering?"
    data flow in ML deployment context
    Introduction to flow based programming
    orchestrator
    dataflow

    Outlier Detection

    Learn how to implement outlier detection in Seldon Core using Alibi-Detect integration for model monitoring and anomaly detection.

    Outlier detection models are treated as any other Model. You can run any saved Alibi-Detect outlier detection model by adding the requirement alibi-detect.

    An example outlier detection model from the CIFAR10 image classification example is shown below:

    # samples/models/cifar10-outlier-detect.yaml
    apiVersion: mlops.seldon.io/v1alpha1
    kind: Model
    metadata:
      name: cifar10-outlier
    spec:
      storageUri: "gs://seldon-models/scv2/examples/mlserver_1.3.5/cifar10/outlier-detector"
      requirements:
        - mlserver
        - alibi-detect

    Examples

    CIFAR10 image classification with outlier detector
    Tabular income classification model with outlier detector

    Drift Detection

    Learn how to implement drift detection in Seldon Core 2 using Alibi-Detect integration for model monitoring and batch processing.

    Drift detection models are treated as any other Model. You can run any saved Alibi-Detect drift detection model by adding the requirement alibi-detect.

    An example drift detection model from the CIFAR10 image classification example is shown below:

    # samples/models/cifar10-drift-detect.yaml
    apiVersion: mlops.seldon.io/v1alpha1
    kind: Model
    metadata:
      name: cifar10-drift
    spec:
      storageUri: "gs://seldon-models/scv2/examples/mlserver_1.3.5/cifar10/drift-detector"
      requirements:
        - mlserver
        - alibi-detect

    Usually you would run these models in an asynchronous part of a Pipeline, i.e. they are not connected to the output of the Pipeline which defines the synchronous path. For example, the CIFAR-10 image detection example uses a pipeline as shown below:

    # samples/pipelines/cifar10.yaml
    apiVersion: mlops.seldon.io/v1alpha1
    kind: Pipeline
    metadata:
      name: cifar10-production
    spec:
      steps:
        - name: cifar10
        - name: cifar10-outlier
        - name: cifar10-drift
          batch:
            size: 20
      output:
        steps:
        - cifar10
        - cifar10-outlier.outputs.is_outlier

    Note how the cifar10-drift model is not part of the path to the outputs. Drift alerts can be read from the Kafka topic of the model.

    Examples

    CIFAR10 image classification with drift detector
    Tabular income classification model with drift detector

    Model Performance Metrics

    Learn how to run performance tests for Seldon Core 2 deployments, including load testing, benchmarking, and analyzing inference latency and throughput metrics.

    This section describes how a user can run performance tests to understand the limits of a particular SCv2 deployment.

    The base directly is tests/k6

    Driver

    k6 is used to drive requests for load, unload and infer workloads. It is recommended that the load test is run within the same cluster that has SCv2 installed as it requires internal access to some of the services that are not automatically exposed to the outside world. Furthermore having the driver withthin the same cluster minimises link latency to SCv2 entrypoint; therefore infer latencies are more representatives of actual overheads of the system.

    Tests

    • Envoy Tests synchronous inference requests via envoy

    To run: make deploy-envoy-test

    • Agent Tests inference requests direct to a specific agent, defaults to triton-0 or mlserver-0

    To run: make deploy-rproxy-test pr make deploy-rproxy-mlserver-test

    • Server Tests inference requests direct to a specific server (bypassing agent), defaults to triton-0 or mlserver-0

    to run: make deploy-server-test or deploy-server-mlserver-test

    • Pipeline gateway (HTTP-Kafka gateway) Tests inference requests to one-node pipeline HTTP and GPRC requests

    To run: make deploy-kpipeline-test

    • Model gateway (Kafka-HTTP gateway) Tests inference requests to a model via kafka

    To run: deploy-kmodel-test

    Results

    One way to look at results is to look at the log of the pod that executed the kubernetes job.

    Results can also be persisted to a gs bucket, a service account k6-sa-key in the same namespace is required,

    Users can also look at the metrics that are exposed in prometheus while the test is underway

    Building k6 image

    In the case a user is modifying the actual scenario of the test:

    • export DOCKERHUB_USERNAME=mydockerhubaccount

    • build the k6 image via make build-push

    • in the same shell environment, deploying jobs will use this custome built docker image

    Modifying tests

    Users can modify settings of the tests in tests/k6/configs/k8s/base/k6.yaml. This will apply to all subsequent tests that are deployed using the above process.

    Settings

    Some settings that can be changed

    • k6 args

    for a full list, check

    • Environment variables

      • for MODEL_TYPE, choose from:

    Explainability

    Learn how to implement model explainability in Seldon Core using Alibi-Explain integration for black box model explanations and pipeline insights.

    Explainers are Model resources with some extra settings. They allow a range of explainers from the Alibi-Explain library to be run on MLServer.

    An example Anchors explainer definitions is shown below.

    The key additions are:

    • type: This must be one of the supported Alibi Explainer types supported by the Alibi Explain runtime in MLServer.

    • modelRef: The model name for black box explainers.

    • pipelineRef: The pipeline name for black box explainers.

    Only one of modelRef and pipelineRef is allowed.

    Pipeline Explanations

    Blackbox explainers can explain a Pipeline as well as a model. An example from the is show below.

    Examples

    # samples/models/income-explainer.yaml
    apiVersion: mlops.seldon.io/v1alpha1
    kind: Model
    metadata:
      name: income-explainer
    spec:
      storageUri: "gs://seldon-models/scv2/samples/mlserver_1.5.0/income-sklearn/anchor-explainer"
      explainer:
        type: anchor_tabular
        modelRef: income
    k6 args
    Huggingface sentiment demo
    Tabular income classification model with Anchor Tabular black box model explainer
    Huggingface Sentiment model with Anchor Text black box pipeline explainer
    Anchor Text movies sentiment explainer
      # tests/k6/configs/k8s/base/k6.yaml
      args: [
        "--no-teardown",
        "--summary-export",
        "results/base.json",
        "--out",
        "csv=results/base.gz",
        "-u",
        "5",
        "-i",
        "100000",
        "-d",
        "120m",
        "scenarios/infer_constant_vu.js",
        ]
      # # infer_constant_rate
      # args: [
      #   "--no-teardown",
      #   "--summary-export",
      #   "results/base.json",
      #   "--out",
      #   "csv=results/base.gz",
      #   "scenarios/infer_constant_rate.js",
      #   ]
      # # k8s-test-script
      # args: [
      #   "--summary-export",
      #   "results/base.json",
      #   "--out",
      #   "csv=results/base.gz",
      #   "scenarios/k8s-test-script.js",
      #   ]
      # # core2_qa_control_plane_ops
      # args: [
      #   "--no-teardown",
      #   "--verbose",
      #   "--summary-export",
      #   "results/base.json",
      #   "--out",
      #   "csv=results/base.gz",
      #   "-u",
      #   "5",
      #   "-i",
      #   "10000",
      #   "-d",
      #   "9h",
      #   "scenarios/core2_qa_control_plane_ops.js",
      #   ]
      - name: INFER_HTTP_ITERATIONS
        value: "1"
      - name: INFER_GRPC_ITERATIONS
        value: "1"
      - name: MODELNAME_PREFIX
        value: "tfsimplea,pytorch-cifar10a,tfmnista,mlflow-winea,irisa"
      - name: MODEL_TYPE
        value: "tfsimple,pytorch_cifar10,tfmnist,mlflow_wine,iris"
      # Specify MODEL_MEMORY_BYTES using unit-of measure suffixes (k, M, G, T)
      # rather than numbers without units of measure. If supplying "naked
      # numbers", the seldon operator will take care of converting the number
      # for you but also take ownership of the field (as FieldManager), so the
      # next time you run the scenario creating/updating of the model CR will
      # fail.
      - name: MODEL_MEMORY_BYTES
        value: "400k,8M,43M,200k,3M"
      - name: MAX_MEM_UPDATE_FRACTION
        value: "0.1"
      - name: MAX_NUM_MODELS
        value: "800,100,25,100,100"
        # value: "0,0,25,100,100"
      #
      # MAX_NUM_MODELS_HEADROOM is a variable used by control-plane tests.
      # It's the approximate number of models that can be created over
      # MAX_NUM_MODELS over the experiment. In the worst case scenario
      # (very unlikely) the HEADROOM values may temporarily exceed the ones
      # specified here with the number of VUs, because each VU checks the
      # headroom constraint independently before deciding on the available
      # operations (no communication/sync between VUs)
      # - name: MAX_NUM_MODELS_HEADROOM
      #   value: "20,5,0,20,30"
      #
      # MAX_MODEL_REPLICAS is used by control-plane tests. It controls the
      # maximum number of replicas that may be requested when
      # creating/updating models of a given type.
      # - name: MAX_MODEL_REPLICAS
      #   value: "2,2,0,2,2"
      #
      - name: INFER_BATCH_SIZE
        value: "1,1,1,1,1"
      # MODEL_CREATE_UPDATE_DELETE_BIAS defines the probability ratios between
      # the operations, for control-plane tests. For example, "1, 4, 3"
      # makes an Update four times more likely then a Create, and a Delete 3
      # times more likely than the Create.
      # - name: MODEL_CREATE_UPDATE_DELETE_BIAS
      #   value: "1,3,1"
      - name: WARMUP
        value: "false"
    // tests/k6/components/model.js
      import { dump as yamlDump } from "https://cdn.jsdelivr.net/npm/[email protected]/dist/js-yaml.mjs";
      import { getConfig } from '../components/settings.js'
    
      const tfsimple_string = "tfsimple_string"
      const tfsimple = "tfsimple"
      const iris = "iris"  // mlserver
      const pytorch_cifar10 = "pytorch_cifar10"
      const tfmnist = "tfmnist"
      const tfresnet152 = "tfresnet152"
      const onnx_gpt2 = "onnx_gpt2"
      const mlflow_wine = "mlflow_wine" // mlserver
      const add10 = "add10" // https://github.com/SeldonIO/triton-python-examples/tree/master/add10
      const sentiment = "sentiment" // mlserver
    # samples/models/hf-sentiment-explainer.yaml
    apiVersion: mlops.seldon.io/v1alpha1
    kind: Model
    metadata:
      name: sentiment-explainer
    spec:
      storageUri: "gs://seldon-models/scv2/examples/huggingface/speech-sentiment/explainer"
      explainer:
        type: anchor_text
        pipelineRef: sentiment-explain