# Operational Metrics

While the system runs, Prometheus collects metrics that enable you to observe various aspects of Seldon Core 2, including throughput, latency, memory, and CPU usage. In addition to the standard Kubernetes metrics scraped by Prometheus, a [Grafana dashboard](/seldon-core-2/user-guide/operational-monitoring/observability.md#grafana) provides a comprehensive system overview.

## List of Seldon Core 2 metrics

The list of Seldon Core 2 metrics that are compiling is as follows.

For the agent that sits next to the inference servers:

```go
// scheduler/pkg/metrics/agent.go
const (
	// Histograms do no include pipeline label for efficiency
	modelHistogramName = "seldon_model_infer_api_seconds"
	// We use base infer counters to store core metrics per pipeline
	modelInferCounterName                 = "seldon_model_infer_total"
	modelInferLatencyCounterName          = "seldon_model_infer_seconds_total"
	modelAggregateInferCounterName        = "seldon_model_aggregate_infer_total"
	modelAggregateInferLatencyCounterName = "seldon_model_aggregate_infer_seconds_total"
)

// Agent metrics
const (
	cacheEvictCounterName                              = "seldon_cache_evict_count"
	cacheMissCounterName                               = "seldon_cache_miss_count"
	loadModelCounterName                               = "seldon_load_model_counter"
	unloadModelCounterName                             = "seldon_unload_model_counter"
	loadedModelGaugeName                               = "seldon_loaded_model_gauge"
	loadedModelMemoryGaugeName                         = "seldon_loaded_model_memory_bytes_gauge"
	evictedModelMemoryGaugeName                        = "seldon_evicted_model_memory_bytes_gauge"
	serverReplicaMemoryCapacityGaugeName               = "seldon_server_replica_memory_capacity_bytes_gauge"
	serverReplicaMemoryCapacityWithOverCommitGaugeName = "seldon_server_replica_memory_capacity_overcommit_bytes_gauge"
)
```

For the pipeline gateway that handles requests to pipelines:

```go
// scheduler/pkg/metrics/gateway.go
// The aggregate metrics exist for efficiency, as the summation can be
// very slow in Prometheus when many pipelines exist.
const (
	// Histograms do no include model label for efficiency
	pipelineHistogramName = "seldon_pipeline_infer_api_seconds"
	// We use base infer counters to store core metrics per pipeline
	pipelineInferCounterName                 = "seldon_pipeline_infer_total"
	pipelineInferLatencyCounterName          = "seldon_pipeline_infer_seconds_total"
	pipelineAggregateInferCounterName        = "seldon_pipeline_aggregate_infer_total"
	pipelineAggregateInferLatencyCounterName = "seldon_pipeline_aggregate_infer_seconds_total"
)
```

Many of these metrics are model and pipeline level counters and gauges. Some of these metrics are aggregated to speed up the display of graphs. Currently,per-model histogram metrics are not stored for performance reasons. However, per-pipeline histogram metrics are stored.

This is experimental, and these metrics are expected to evolve to better capture relevant trends as more information becomes available about system usage.

### Local Metrics Examples

An [example](/seldon-core-2/user-guide/operational-monitoring/local-metrics-test.md) to show raw metrics that Prometheus will scrape.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.seldon.ai/seldon-core-2/user-guide/operational-monitoring/operational.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
