Metrics
Last updated
Was this helpful?
Last updated
Was this helpful?
Out-of-the-box, MLServer exposes a set of metrics that help you monitor your machine learning workloads in production. These include standard metrics like number of requests and latency.
On top of these, you can also register and track your own as part of your .
By default, MLServer will expose metrics around inference requests (count and error rate) and the status of its internal requests queues. These internal queues are used for and .
model_infer_request_success
Number of successful inference requests.
model_infer_request_failure
Number of failed inference requests.
batch_request_queue
parallel_request_queue
On top of the default set of metrics, MLServer's REST server will also expose a set of metrics specific to REST.
[rest_server]_requests
Number of REST requests, labelled by endpoint and status code.
[rest_server]_requests_duration_seconds
Latency of REST requests.
[rest_server]_requests_in_progress
Number of in-flight REST requests.
On top of the default set of metrics, MLServer's gRPC server will also expose a set of metrics specific to gRPC.
grpc_server_handled
Number of gRPC requests, labelled by gRPC code and method.
grpc_server_started
Number of in-flight gRPC requests.
MLServer allows you to register custom metrics within your custom inference runtimes. This can be done through the mlserver.register()
and mlserver.log()
methods.
mlserver.register
: Register a new metric.
mlserver.log
: Log a new set of metric / value pairs. If there's any unregistered metric, it will get registered on-the-fly.
Below, you can find the list of standardised labels that you will be able to find on model-specific metrics:
model_name
Model Name (e.g. my-custom-model
)
model_version
Model Version (e.g. v1.2.3
)
metrics_endpoint
Path under which the metrics endpoint will be exposed.
/metrics
metrics_port
Port used to serve the metrics server.
8082
metrics_rest_server_prefix
Prefix used for metric names specific to MLServer's REST inference interface.
rest_server
metrics_dir
MLServer's current working directory (i.e. $PWD
)
Queue size for the queue.
Queue size for the queue.
Custom metrics will generally be registered in the load() <mlserver.MLModel.load>
method and then used in the predict() <mlserver.MLModel.predict>
method of your .
For metrics specific to a model (e.g. , request counts, etc), MLServer will always label these with the model name and model version. Downstream, this will allow to aggregate and query metrics per model.
MLServer will expose metric values through a metrics endpoint exposed on its own metric server. This endpoint can be polled by or other -compatible backends.
Below you can find the available to control the behaviour of the metrics server:
Directory used to store internal metric files (used to support metrics sharing across ). This is equivalent to Prometheus' env var.