There are two kinds of metrics present in Seldon Core 2:
Operational metrics describe the performance of components in the system. Some examples of common operational considerations are memory consumption and CPU usage, request latency and throughput, and cache utilisation rates. Generally speaking, these are the metrics system administrators, operations teams, and engineers will be interested in.
Usage metrics describe the system at a higher and less dynamic level. Some examples include the number of deployed servers and models, and component versions. These are not typically metrics that engineers need insight into, but may be relevant to platform providers and operations teams.
While the system is running we collect metrics via Prometheus that allow users to observe different aspects of SCv2 with regards to throughput, latency, memory, CPU etc. This is in addition to the standard Kubernetes metrics that are scraped by Prometheus. There is a also a Grafana dashboard (referenced below) that provides an overview of the system.
The list of SCv2 metrics that we are compiling is as follows.
For the agent that sits next to the inference servers:
For the pipeline gateway that handles requests to pipelines:
Many of these metrics are model and pipeline level counters and gauges. We also aggregate some of these metrics to speed up the display of graphs. We don't presently store per-model histogram metrics for performance reasons. However, we do presently store per-pipeline histogram metrics.
This is experimental and these metrics are bound to change to reflect the trends we want to capture as we get more information about the usage of the system.
We have a prebuilt Grafana dashboard that makes use of many of the metrics that we expose.
Grafana and Prometheus are available when you run Seldon locally. You will be able to connect to the Grafana dashboard at http://localhost:3000
. Prometheus will be available at http://localhost:9090
.
Download the dashboard from and import it in Grafana, making sure that the data source is pointing to the correct Prometheus store. Find more information on how to import the dashboard .
An to show raw metrics that Prometheus will scrape.
There are various interesting system metrics about how Seldon Core v2 is used. These metrics can be recorded anonymously and sent to Seldon by a lightweight, optional, stand-alone component called Hodometer.
When provided, these metrics will be used to understand the adoption of Seldon Core v2 and how people interact with it. For example, knowing how many clusters Seldon Core v2 is running on, if it is used in Kubernetes or for local development, and how many people are benefitting from features like multi-model serving.
Hodometer is not an integral part of Seldon Core v2, but rather an independent component which connects to the public APIs of the Seldon Core v2 scheduler. If deployed in Kubernetes, it will also try to request some basic information from the Kubernetes API.
Recorded metrics are sent to Seldon and, optionally, to any additional endpoints you define.
Hodometer was explicitly designed with privacy of user information and transparency of implementation in mind.
It does not record any sensitive or identifying information. For example, it has no knowledge of IP addresses, model names, or user information. All information sent to Seldon is anonymised with a completely random cluster identifier.
Hodometer supports different information levels, so you have full control over what metrics are provided to Seldon, if any.
For transparency, the implementation is fully open-source and designed to be easy to read. The full source code is available here, with metrics defined in code here. See below for an equivalent table of metrics.
Metrics are collected as periodic snapshots a few times per day. They are lightweight to collect, coming mostly from the Seldon Core v2 scheduler, and are heavily aggregated. As such, they should have minimal impact on CPU, memory, and network consumption.
Hodometer does not store anything it records, so does not have any persistent storage. As a result, it should not be considered a replacement for tools like Prometheus.
Hodometer supports 3 different metrics levels:
Alternatively, usage metrics can be completely disabled. To do so, simply remove any existing deployment of Hodometer or disable it in the installation for your environment, discussed below.
The following environment variables control the behaviour of Hodometer, regardless of the environment it is installed in.
Hodometer is installed as a separate deployment, by default in the same namespace as the rest of the Seldon components.
Helm
If you install Seldon Core v2 by Helm chart, there are values corresponding to the key environment variables discussed above. These Helm values and their equivalents are provided below:
If you do not want usage metrics to be recorded, you can disable Hodometer via the hodometer.disable
Helm value when installing the runtime Helm chart. The following command disables collection of usage metrics in fresh installations and also serves to remove Hodometer from an existing installation:
Note: It is a good practice to set Helm values in values file. These can be applied by using the -f <filename>
switch when running Helm.
The Compose setup provides a pre-configured and opinionated, yet still flexible, approach to using Seldon Core v2.
Hodometer is defined as a service called hodometer
in the Docker Compose manifest. It is automatically enabled when running as per the installation instructions.
You can disable Hodometer in Docker Compose by removing the corresponding service from the base manifest. Alternatively, you can gate it behind a profile. If the service is already running, you can stop it directly using docker-compose stop ...
.
Configuration can be provided by environment variables when running make
or directly invoking docker-compose
. The available variables are defined in the Docker Compose environment file, prefixed with HODOMETER_
.
Hodometer can be instructed to publish metrics not only to Seldon, but also to any extra endpoints you specify. This is controlled by the EXTRA_PUBLISH_URLS
environment variable, which expects a comma-separated list of HTTP-compatible URLs.
You might choose to use this for your own usage monitoring. For example, you could capture these metrics and expose them to Prometheus or another monitoring system using your own service.
Metrics are recorded in MixPanel-compatible format, which employs a highly flexible JSON schema.
For an example of how to define your own metrics listener, see the receiver
Go package in the hodometer
sub-project.
Level | Description |
---|---|
Flag | Format | Example | Description |
---|---|---|---|
Helm value | Environment variable |
---|---|
Metric name | Level | Format | Notes |
---|---|---|---|
Cluster
Basic information about the Seldon Core v2 installation
Resource
High-level information about which Seldon Core v2 resources are used
Feature
More detailed information about how resources are used and whether or not certain feature flags are enabled
METRICS_LEVEL
string
feature
Level of detail for recorded metrics; one of feature
, resource
, or cluster
EXTRA_PUBLISH_URLS
comma-separated list of URLs
http://my-endpoint-1:8000,http://my-endpoint-2:8000
Additional endpoints to publish metrics to
SCHEDULER_HOST
string
seldon-scheduler
Hostname for Seldon Core v2 scheduler
SCHEDULER_PORT
integer
9004
Port for Seldon Core v2 scheduler
LOG_LEVEL
string
info
Level of detail for application logs
hodometer.metricsLevel
METRICS_LEVEL
hodometer.extraPublishUrls
EXTRA_PUBLISH_URLS
hodometer.logLevel
LOG_LEVEL
cluster_id
cluster
UUID
A random identifier for this cluster for de-duplication
seldon_core_version
cluster
Version number
E.g. 1.2.3
is_global_installation
cluster
Boolean
Whether installation is global or namespaced
is_kubernetes
cluster
Boolean
Whether or not the installation is in Kubernetes
kubernetes_version
cluster
Version number
Kubernetes server version, if inside Kubernetes
node_count
cluster
Integer
Number of nodes in the cluster, if inside Kubernetes
model_count
resource
Integer
Number of Model
resources
pipeline_count
resource
Integer
Number of Pipeline
resources
experiment_count
resource
Integer
Number of Experiment
resources
server_count
resource
Integer
Number of Server
resources
server_replica_count
resource
Integer
Total number of Server
resource replicas
multimodel_enabled_count
feature
Integer
Number of Server
resources with multi-model serving enabled
overcommit_enabled_count
feature
Integer
Number of Server
resources with overcommitting enabled
gpu_enabled_count
feature
Integer
Number of Server
resources with GPUs attached
inference_server_name
feature
String
Name of inference server, e.g. MLServer or Triton
server_cpu_cores_sum
feature
Float
Total of CPU limits across all Server
resource replicas, in cores
server_memory_gb_sum
feature
Float
Total of memory limits across all Server
resource replicas, in GiB