Classification Metrics
The module provides metrics for binary and multiclass classification, but not for Multilabel.
Confusion Matrix
Displays the True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN), summarising the model’s classification performance.
The classification metrics API returns:
A list of categories (classes).
TP, TN, FP, and FN for each class.
A flattened confusion matrix.
Given the following confusion matrix:

The flattened confusion matrix values will be 3, 0, 0, 0, 2, 1, 0, 0, 4
.
Accuracy
Measures the fraction of correct predictions:
Where:
is the predicted value,
is the ground truth value,
is the total number of samples.
Binary Classification
For the following predictions and ground truth of a binary classification problem:
Predictions:
1, 1, 0, 1, 0, 1, 0, 1, 1, 0
Ground truth:
0, 0, 0, 0, 0, 0, 0, 0, 1, 1
The accuracy is calculated as:
In this case, the accuracy metric is 0.40
, meaning that 40% of the predictions match the ground truth.
Multi-Class Classification
For the following predictions and ground truth of a multi-class classification problem:
Predictions:
0, 1, 0, 1, 2, 2, 2, 1, 2, 0, 0, 2, 2, 1, 2
Ground truth:
0, 0, 0, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2
The accuracy is calculated as:
In this case, the accuracy metric is 0.60
, meaning that 60% of the predictions match the ground truth.
Precision, Recall, Specificity and F1
Definition
Precision: Evaluates the proportion of positive predictions that are correct:
Recall: Assesses the proportion of actual positives correctly identified:
Specificity: Measures the proportion of actual negatives correctly identified:
F1 Score: Represents the harmonic mean of Precision and Recall, balancing their trade-offs:
Average Type
Each metric is calculated using the macro averaging method (for both binary and multi-class classification), which involves the following steps:
First, the metric is calculated for each class.
Then, the unweighted mean of these individual metrics is computed.
The final formulas for each metric are as follows, where represents each class of classes:
Example
Below is an example of the classification metrics API usage:
import requests
url = f"http://{CLUSTER_IP}/metrics-server/api/v1/metrics/pipeline/classification"
params = {
'namespace': 'seldon',
'pipelineName': 'iris-model-pipeline',
'modelName': 'iris-model',
'startTime': '2025-02-25T11:51:22Z',
'endTime': '2025-02-25T11:53:22Z',
'interval': '10s'
}
response = requests.get(url, params=params)
Last updated
Was this helpful?