1 of 4

Metrics

There are two kinds of metrics present in Seldon Core 2:

operational metrics
usage metrics

Operational metrics describe the performance of components in the system. Some examples of common operational considerations are memory consumption and CPU usage, request latency and throughput, and cache utilisation rates. Generally speaking, these are the metrics system administrators, operations teams, and engineers will be interested in.

Usage metrics describe the system at a higher and less dynamic level. Some examples include the number of deployed servers and models, and component versions. These are not typically metrics that engineers need insight into, but may be relevant to platform providers and operations teams.

Usage

There are various interesting system metrics about how Seldon Core v2 is used. These metrics can be recorded anonymously and sent to Seldon by a lightweight, optional, stand-alone component called Hodometer.

When provided, these metrics will be used to understand the adoption of Seldon Core v2 and how people interact with it. For example, knowing how many clusters Seldon Core v2 is running on, if it is used in Kubernetes or for local development, and how many people are benefitting from features like multi-model serving.

Architecture

Hodometer is not an integral part of Seldon Core v2, but rather an independent component which connects to the public APIs of the Seldon Core v2 scheduler. If deployed in Kubernetes, it will also try to request some basic information from the Kubernetes API.

Recorded metrics are sent to Seldon and, optionally, to any additional endpoints you define.

Privacy

Hodometer was explicitly designed with privacy of user information and transparency of implementation in mind.

It does not record any sensitive or identifying information. For example, it has no knowledge of IP addresses, model names, or user information. All information sent to Seldon is anonymised with a completely random cluster identifier.

Hodometer supports different information levels, so you have full control over what metrics are provided to Seldon, if any.

For transparency, the implementation is fully open-source and designed to be easy to read. The full source code is available here, with metrics defined in code here. See below for an equivalent table of metrics.

Performance

Metrics are collected as periodic snapshots a few times per day. They are lightweight to collect, coming mostly from the Seldon Core v2 scheduler, and are heavily aggregated. As such, they should have minimal impact on CPU, memory, and network consumption.

Hodometer does not store anything it records, so does not have any persistent storage. As a result, it should not be considered a replacement for tools like Prometheus.

Configuration

Metrics levels

Hodometer supports 3 different metrics levels:

Alternatively, usage metrics can be completely disabled. To do so, simply remove any existing deployment of Hodometer or disable it in the installation for your environment, discussed below.

Options

The following environment variables control the behaviour of Hodometer, regardless of the environment it is installed in.

Kubernetes

Hodometer is installed as a separate deployment, by default in the same namespace as the rest of the Seldon components.

Helm

If you install Seldon Core v2 by Helm chart, there are values corresponding to the key environment variables discussed above. These Helm values and their equivalents are provided below:

If you do not want usage metrics to be recorded, you can disable Hodometer via the hodometer.disable Helm value when installing the runtime Helm chart. The following command disables collection of usage metrics in fresh installations and also serves to remove Hodometer from an existing installation:

helm install seldon-v2-runtime k8s/helm-charts/seldon-core-v2-runtime \
  --namespace seldon-mesh \
  --set hodometer.disable=true

Note: It is a good practice to set Helm values in values file. These can be applied by using the -f <filename> switch when running Helm.

Docker Compose

The Compose setup provides a pre-configured and opinionated, yet still flexible, approach to using Seldon Core v2.

Hodometer is defined as a service called hodometer in the Docker Compose manifest. It is automatically enabled when running as per the installation instructions.

You can disable Hodometer in Docker Compose by removing the corresponding service from the base manifest. Alternatively, you can gate it behind a profile. If the service is already running, you can stop it directly using docker-compose stop ....

Configuration can be provided by environment variables when running make or directly invoking docker-compose. The available variables are defined in the Docker Compose environment file, prefixed with HODOMETER_.

Extra publish URLs

Hodometer can be instructed to publish metrics not only to Seldon, but also to any extra endpoints you specify. This is controlled by the EXTRA_PUBLISH_URLS environment variable, which expects a comma-separated list of HTTP-compatible URLs.

You might choose to use this for your own usage monitoring. For example, you could capture these metrics and expose them to Prometheus or another monitoring system using your own service.

Metrics are recorded in MixPanel-compatible format, which employs a highly flexible JSON schema.

For an example of how to define your own metrics listener, see the receiver Go package in the hodometer sub-project.

List of metrics

Operational

While the system is running we collect metrics via Prometheus that allow users to observe different aspects of SCv2 with regards to throughput, latency, memory, CPU etc. This is in addition to the standard Kubernetes metrics that are scraped by Prometheus. There is a also a Grafana dashboard (referenced below) that provides an overview of the system.

List of SCv2 metrics

The list of SCv2 metrics that we are compiling is as follows.

For the agent that sits next to the inference servers:

For the pipeline gateway that handles requests to pipelines:

Many of these metrics are model and pipeline level counters and gauges. We also aggregate some of these metrics to speed up the display of graphs. We don't presently store per-model histogram metrics for performance reasons. However, we do presently store per-pipeline histogram metrics.

This is experimental and these metrics are bound to change to reflect the trends we want to capture as we get more information about the usage of the system.

Grafana dashboard

We have a prebuilt Grafana dashboard that makes use of many of the metrics that we expose.

Local Use

Grafana and Prometheus are available when you run Seldon locally. You will be able to connect to the Grafana dashboard at http://localhost:3000. Prometheus will be available at http://localhost:9090.

Kubernetes Installation

Local Metrics Examples

Local Metrics

Run these examples from the samples folder at the root of the repo.

This notebook tests the exposed Prometheus metrics of model and pipeline servers.

Requires: prometheus_client and requests libraries. See docs for full set of metrics available.

mlserver_metrics_host="0.0.0.0:9006"
triton_metrics_host="0.0.0.0:9007"
pipeline_metrics_host="0.0.0.0:9009"

from prometheus_client.parser import text_string_to_metric_families
import requests

def scrape_metrics(host):
    data = requests.get(f"http://{host}/metrics").text
    return {
        family.name: family for family in text_string_to_metric_families(data)
    }

def print_sample(family, label, value):
    for sample in family.samples:
        if sample.labels[label] == value:
            print(sample)

def get_model_infer_count(host, model_name):
    metrics = scrape_metrics(host)
    family = metrics["seldon_model_infer"]
    print_sample(family, "model", model_name)

def get_pipeline_infer_count(host, pipeline_name):
    metrics = scrape_metrics(host)
    family = metrics["seldon_pipeline_infer"]
    print_sample(family, "pipeline", pipeline_name)

MLServer Model

seldon model load -f ./models/sklearn-iris-gs.yaml
seldon model status iris -w ModelAvailable | jq -M .

{}
{}

seldon model infer iris -i 50 \
  '{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}]}'

Success: map[:iris_1::50]

seldon model infer iris --inference-mode grpc -i 100 \
   '{"model_name":"iris","inputs":[{"name":"input","contents":{"fp32_contents":[1,2,3,4]},"datatype":"FP32","shape":[1,4]}]}'

Success: map[:iris_1::100]

get_model_infer_count(mlserver_metrics_host,"iris")

Sample(name='seldon_model_infer_total', labels={'code': '200', 'method_type': 'rest', 'model': 'iris', 'model_internal': 'iris_1', 'server': 'mlserver', 'server_replica': '0'}, value=50.0, timestamp=None, exemplar=None)
Sample(name='seldon_model_infer_total', labels={'code': 'OK', 'method_type': 'grpc', 'model': 'iris', 'model_internal': 'iris_1', 'server': 'mlserver', 'server_replica': '0'}, value=100.0, timestamp=None, exemplar=None)

seldon model unload iris

{}

Triton Model

Load the model.

seldon model load -f ./models/tfsimple1.yaml
seldon model status tfsimple1 -w ModelAvailable | jq -M .

{}
{}

seldon model infer tfsimple1 -i 50\
    '{"inputs":[{"name":"INPUT0","data":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16],"datatype":"INT32","shape":[1,16]},{"name":"INPUT1","data":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16],"datatype":"INT32","shape":[1,16]}]}'

Success: map[:tfsimple1_1::50]

seldon model infer tfsimple1 --inference-mode grpc -i 100 \
    '{"model_name":"tfsimple1","inputs":[{"name":"INPUT0","contents":{"int_contents":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]},"datatype":"INT32","shape":[1,16]},{"name":"INPUT1","contents":{"int_contents":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]},"datatype":"INT32","shape":[1,16]}]}'

Success: map[:tfsimple1_1::100]

get_model_infer_count(triton_metrics_host,"tfsimple1")

Sample(name='seldon_model_infer_total', labels={'code': '200', 'method_type': 'rest', 'model': 'tfsimple1', 'model_internal': 'tfsimple1_1', 'server': 'triton', 'server_replica': '0'}, value=50.0, timestamp=None, exemplar=None)
Sample(name='seldon_model_infer_total', labels={'code': 'OK', 'method_type': 'grpc', 'model': 'tfsimple1', 'model_internal': 'tfsimple1_1', 'server': 'triton', 'server_replica': '0'}, value=100.0, timestamp=None, exemplar=None)

seldon model unload tfsimple1

{}

Pipeline

seldon model load -f ./models/tfsimple1.yaml
seldon model load -f ./models/tfsimple2.yaml
seldon model status tfsimple1 -w ModelAvailable | jq -M .
seldon model status tfsimple2 -w ModelAvailable | jq -M .
seldon pipeline load -f ./pipelines/tfsimples.yaml
seldon pipeline status tfsimples -w PipelineReady

{}
{}
{}
{}
{}
{"pipelineName":"tfsimples", "versions":[{"pipeline":{"name":"tfsimples", "uid":"cdqji39qa12c739ab3o0", "version":2, "steps":[{"name":"tfsimple1"}, {"name":"tfsimple2", "inputs":["tfsimple1.outputs"], "tensorMap":{"tfsimple1.outputs.OUTPUT0":"INPUT0", "tfsimple1.outputs.OUTPUT1":"INPUT1"}}], "output":{"steps":["tfsimple2.outputs"]}, "kubernetesMeta":{}}, "state":{"pipelineVersion":2, "status":"PipelineReady", "reason":"created pipeline", "lastChangeTimestamp":"2022-11-16T19:25:01.255955114Z"}}]}

seldon pipeline infer tfsimples -i 50 \
    '{"inputs":[{"name":"INPUT0","data":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16],"datatype":"INT32","shape":[1,16]},{"name":"INPUT1","data":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16],"datatype":"INT32","shape":[1,16]}]}'

Success: map[:tfsimple1_1::50 :tfsimple2_1::50 :tfsimples.pipeline::50]

get_pipeline_infer_count(pipeline_metrics_host,"tfsimples")

Sample(name='seldon_pipeline_infer_total', labels={'code': '200', 'method_type': 'rest', 'pipeline': 'tfsimples', 'server': 'pipeline-gateway'}, value=50.0, timestamp=None, exemplar=None)

seldon model unload tfsimple1
seldon model unload tfsimple2
seldon pipeline unload tfsimples

{}
{}
{}