LogoLogo
  • MLServer
  • Getting Started
  • User Guide
    • Content Types (and Codecs)
    • OpenAPI Support
    • Parallel Inference
    • Adaptive Batching
    • Custom Inference Runtimes
    • Metrics
    • Deployment
      • Seldon Core
      • KServe
    • Streaming
  • Inference Runtimes
    • SKLearn
    • XGBoost
    • MLFlow
    • Spark MlLib
    • LightGBM
    • Catboost
    • Alibi-Detect
    • Alibi-Explain
    • HuggingFace
    • Custom
  • Reference
    • MLServer Settings
    • Model Settings
    • MLServer CLI
    • Python API
      • MLModel
      • Types
      • Codecs
      • Metrics
  • Examples
    • Serving Scikit-Learn models
    • Serving XGBoost models
    • Serving LightGBM models
    • Serving MLflow models
    • Serving a custom model
    • Serving Alibi-Detect models
    • Serving HuggingFace Transformer Models
    • Multi-Model Serving
    • Model Repository API
    • Content Type Decoding
    • Custom Conda environments in MLServer
    • Serving a custom model with JSON serialization
    • Serving models through Kafka
    • Streaming
    • Deploying a Custom Tensorflow Model with MLServer and Seldon Core
  • Changelog
Powered by GitBook
On this page
  • Default Metrics
  • REST Server Metrics
  • gRPC Server Metrics
  • Custom Metrics
  • Metrics Labelling
  • Settings

Was this helpful?

Edit on GitHub
Export as PDF
  1. User Guide

Metrics

PreviousCustom Inference RuntimesNextDeployment

Last updated 7 months ago

Was this helpful?

Out-of-the-box, MLServer exposes a set of metrics that help you monitor your machine learning workloads in production. These include standard metrics like number of requests and latency.

On top of these, you can also register and track your own as part of your .

Default Metrics

By default, MLServer will expose metrics around inference requests (count and error rate) and the status of its internal requests queues. These internal queues are used for and .

Metric Name
Description

model_infer_request_success

Number of successful inference requests.

model_infer_request_failure

Number of failed inference requests.

batch_request_queue

parallel_request_queue

REST Server Metrics

On top of the default set of metrics, MLServer's REST server will also expose a set of metrics specific to REST.

The prefix for the REST-specific metrics will be dependent on the metrics_rest_server_prefix flag from the .

Metric Name
Description

[rest_server]_requests

Number of REST requests, labelled by endpoint and status code.

[rest_server]_requests_duration_seconds

Latency of REST requests.

[rest_server]_requests_in_progress

Number of in-flight REST requests.

gRPC Server Metrics

On top of the default set of metrics, MLServer's gRPC server will also expose a set of metrics specific to gRPC.

Metric Name
Description

grpc_server_handled

Number of gRPC requests, labelled by gRPC code and method.

grpc_server_started

Number of in-flight gRPC requests.

Custom Metrics

MLServer allows you to register custom metrics within your custom inference runtimes. This can be done through the mlserver.register() and mlserver.log() methods.

  • mlserver.register: Register a new metric.

  • mlserver.log: Log a new set of metric / value pairs. If there's any unregistered metric, it will get registered on-the-fly.

Under the hood, metrics logged through the mlserver.log method will get exposed to Prometheus as a Histogram.

import mlserver

from mlserver.types import InferenceRequest, InferenceResponse

class MyCustomRuntime(mlserver.MLModel):
  async def load(self) -> bool:
    self._model = load_my_custom_model()
    mlserver.register("my_custom_metric", "This is a custom metric example")
    return True

  async def predict(self, payload: InferenceRequest) -> InferenceResponse:
    mlserver.log(my_custom_metric=34)
    # TODO: Replace for custom logic to run inference
    return self._model.predict(payload)

Metrics Labelling

If these labels are not present on a specific metric, this means that those metrics can't be sliced at the model level.

Below, you can find the list of standardised labels that you will be able to find on model-specific metrics:

Label Name
Description

model_name

Model Name (e.g. my-custom-model)

model_version

Model Version (e.g. v1.2.3)

Settings

Label Name
Description
Default

metrics_endpoint

Path under which the metrics endpoint will be exposed.

/metrics

metrics_port

Port used to serve the metrics server.

8082

metrics_rest_server_prefix

Prefix used for metric names specific to MLServer's REST inference interface.

rest_server

metrics_dir

MLServer's current working directory (i.e. $PWD)

Queue size for the queue.

Queue size for the queue.

Custom metrics will generally be registered in the load() <mlserver.MLModel.load> method and then used in the predict() <mlserver.MLModel.predict> method of your .

For metrics specific to a model (e.g. , request counts, etc), MLServer will always label these with the model name and model version. Downstream, this will allow to aggregate and query metrics per model.

MLServer will expose metric values through a metrics endpoint exposed on its own metric server. This endpoint can be polled by or other -compatible backends.

Below you can find the available to control the behaviour of the metrics server:

Directory used to store internal metric files (used to support metrics sharing across ). This is equivalent to Prometheus' env var.

custom runtime
Prometheus
OpenMetrics
settings
custom inference runtimes
adaptive batching
communication with the inference workers
custom metrics
MLServer settings
custom metrics
adaptive batching
inference workers
inference workers
$PROMETHEUS_MULTIPROC_DIR