Classification Metrics

The module provides metrics for binary and multiclass classification, but not for Multilabel.

Confusion Matrix

Displays the True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN), summarising the model’s classification performance.

The classification metrics API returns:

A list of categories (classes).
TP, TN, FP, and FN for each class.
A flattened confusion matrix.

Given the following confusion matrix:

The flattened confusion matrix values will be 3, 0, 0, 0, 2, 1, 0, 0, 4.

Accuracy

Measures the fraction of correct predictions:

\texttt{accuracy}(y, \hat{y}) = \frac{1}{n_\text{samples}} \sum_{i=0}^{n_\text{samples}-1} 1(\hat{y}_i = y_i)

Where:

$\hat{y}_i$ is the predicted value,
$y_i$ is the ground truth value,
$n_\text{samples}$ is the total number of samples.

Binary Classification

For the following predictions and ground truth of a binary classification problem:

Predictions: 1, 1, 0, 1, 0, 1, 0, 1, 1, 0
Ground truth: 0, 0, 0, 0, 0, 0, 0, 0, 1, 1

The accuracy is calculated as:

\texttt{accuracy} = \frac{4}{10} = 0.40

In this case, the accuracy metric is 0.40, meaning that 40% of the predictions match the ground truth.

Multi-Class Classification

For the following predictions and ground truth of a multi-class classification problem:

Predictions: 0, 1, 0, 1, 2, 2, 2, 1, 2, 0, 0, 2, 2, 1, 2
Ground truth: 0, 0, 0, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2

The accuracy is calculated as:

\texttt{accuracy} = \frac{9}{15} = 0.60

In this case, the accuracy metric is 0.60, meaning that 60% of the predictions match the ground truth.

Precision, Recall, Specificity and F1

Definition

Precision: Evaluates the proportion of positive predictions that are correct:
$\texttt{precision} = \frac{TP}{TP + FP}$
Recall: Assesses the proportion of actual positives correctly identified:
$\texttt{recall} = \frac{TP}{TP + FN}$
Specificity: Measures the proportion of actual negatives correctly identified:
$\texttt{specificity} = \frac{TN}{TN + FP}$
F1 Score: Represents the harmonic mean of Precision and Recall, balancing their trade-offs:
$\texttt{F1} = 2 \times \frac{\texttt{precision} \times \texttt{recall}}{\texttt{precision} + \texttt{recall}}$

Average Type

Each metric is calculated using the macro averaging method (for both binary and multi-class classification), which involves the following steps:

First, the metric is calculated for each class.
Then, the unweighted mean of these individual metrics is computed.

The final formulas for each metric are as follows, where $x$ represents each class of $m$ classes:

\texttt{precision} = \frac{1}{m} \sum_{x} \texttt{precision}_{x} \\ \texttt{recall} = \frac{1}{m} \sum_{x} \texttt{recall}_{x} \\ \texttt{specificity} = \frac{1}{m} \sum_{x} \texttt{specificity}_{x} \\ \texttt{F1} = \frac{1}{m} \sum_{x} \texttt{F1}_{x} = \frac{1}{m} \sum_{x} \frac{2 \times \texttt{precision}_{x} \times \texttt{recall}_{x}}{\texttt{precision}_{x} + \texttt{recall}_{x}} \\

Notes

When $TP + FP = 0$ , precision is set to 0 and included in the average.
When $TP + FN = 0$ , recall is set to 0 and included in the average.
When $TN + FP = 0$ , specificity is set to 0 and included in the average.
When $TP + FN + FP = 0$ , F1 score is set to 0 and included in the average.

Example

Below is an example of the classification metrics API usage:

import requests

url = f"http://{CLUSTER_IP}/metrics-server/api/v1/metrics/pipeline/classification"

params = {
    'namespace': 'seldon',
    'pipelineName': 'iris-model-pipeline',
    'modelName': 'iris-model',
    'startTime': '2025-02-25T11:51:22Z',
    'endTime': '2025-02-25T11:53:22Z',
    'interval': '10s'
}

response = requests.get(url, params=params)

Expand to see an example of the classification metrics API response

{
  "metrics": [
    {
      "accuracy": 0,
      "confusionMatrix": {
        "categories": [
          "Setosa",
          "Versicolor",
          "Virginica"
        ],
        "computedConfusionValues": [
          {
            "falseNegativeCount": 10,
            "falsePositiveCount": 0,
            "trueNegativeCount": 0,
            "truePositiveCount": 0
          },
          {
            "falseNegativeCount": 0,
            "falsePositiveCount": 10,
            "trueNegativeCount": 0,
            "truePositiveCount": 0
          },
          {
            "falseNegativeCount": 0,
            "falsePositiveCount": 0,
            "trueNegativeCount": 10,
            "truePositiveCount": 0
          }
        ],
        "values": [
          0,
          10,
          0,
          0,
          0,
          0,
          0,
          0,
          0
        ]
      },
      "endTime": "2025-02-25T11:51:32Z",
      "f1": 0,
      "precision": 0,
      "recall": 0,
      "specificity": 0.5
    },
    {
      "accuracy": 0,
      "confusionMatrix": {
        "categories": [
          "Setosa",
          "Versicolor",
          "Virginica"
        ],
        "computedConfusionValues": [
          {
            "falseNegativeCount": 16,
            "falsePositiveCount": 0,
            "trueNegativeCount": 0,
            "truePositiveCount": 0
          },
          {
            "falseNegativeCount": 0,
            "falsePositiveCount": 16,
            "trueNegativeCount": 0,
            "truePositiveCount": 0
          },
          {
            "falseNegativeCount": 0,
            "falsePositiveCount": 0,
            "trueNegativeCount": 16,
            "truePositiveCount": 0
          }
        ],
        "values": [
          0,
          16,
          0,
          0,
          0,
          0,
          0,
          0,
          0
        ]
      },
      "endTime": "2025-02-25T11:51:42Z",
      "f1": 0,
      "precision": 0,
      "recall": 0,
      "specificity": 0.5
    },
    {
      "accuracy": 0.4375,
      "confusionMatrix": {
        "categories": [
          "Setosa",
          "Versicolor",
          "Virginica"
        ],
        "computedConfusionValues": [
          {
            "falseNegativeCount": 9,
            "falsePositiveCount": 0,
            "trueNegativeCount": 7,
            "truePositiveCount": 0
          },
          {
            "falseNegativeCount": 0,
            "falsePositiveCount": 9,
            "trueNegativeCount": 0,
            "truePositiveCount": 7
          },
          {
            "falseNegativeCount": 0,
            "falsePositiveCount": 0,
            "trueNegativeCount": 16,
            "truePositiveCount": 0
          }
        ],
        "values": [
          0,
          9,
          0,
          0,
          7,
          0,
          0,
          0,
          0
        ]
      },
      "endTime": "2025-02-25T11:51:52Z",
      "f1": 0.46666667,
      "precision": 0.4375,
      "recall": 0.5,
      "specificity": 0.6666667
    },
    {
      "accuracy": 0.125,
      "confusionMatrix": {
        "categories": [
          "Setosa",
          "Versicolor",
          "Virginica"
        ],
        "computedConfusionValues": [
          {
            "falseNegativeCount": 2,
            "falsePositiveCount": 0,
            "trueNegativeCount": 6,
            "truePositiveCount": 0
          },
          {
            "falseNegativeCount": 0,
            "falsePositiveCount": 7,
            "trueNegativeCount": 0,
            "truePositiveCount": 1
          },
          {
            "falseNegativeCount": 5,
            "falsePositiveCount": 0,
            "trueNegativeCount": 3,
            "truePositiveCount": 0
          }
        ],
        "values": [
          0,
          2,
          0,
          0,
          1,
          0,
          0,
          5,
          0
        ]
      },
      "endTime": "2025-02-25T11:52:02Z",
      "f1": 0.18181819,
      "precision": 0.125,
      "recall": 0.33333334,
      "specificity": 0.6666667
    },
    {
      "accuracy": -1,
      "confusionMatrix": {
        "categories": [],
        "computedConfusionValues": [],
        "values": []
      },
      "endTime": "2025-02-25T11:52:12Z",
      "f1": -1,
      "precision": -1,
      "recall": -1,
      "specificity": -1
    },
    {
      "accuracy": -1,
      "confusionMatrix": {
        "categories": [],
        "computedConfusionValues": [],
        "values": []
      },
      "endTime": "2025-02-25T11:52:22Z",
      "f1": -1,
      "precision": -1,
      "recall": -1,
      "specificity": -1
    },
    {
      "accuracy": -1,
      "confusionMatrix": {
        "categories": [],
        "computedConfusionValues": [],
        "values": []
      },
      "endTime": "2025-02-25T11:52:32Z",
      "f1": -1,
      "precision": -1,
      "recall": -1,
      "specificity": -1
    },
    {
      "accuracy": -1,
      "confusionMatrix": {
        "categories": [],
        "computedConfusionValues": [],
        "values": []
      },
      "endTime": "2025-02-25T11:52:42Z",
      "f1": -1,
      "precision": -1,
      "recall": -1,
      "specificity": -1
    },
    {
      "accuracy": -1,
      "confusionMatrix": {
        "categories": [],
        "computedConfusionValues": [],
        "values": []
      },
      "endTime": "2025-02-25T11:52:52Z",
      "f1": -1,
      "precision": -1,
      "recall": -1,
      "specificity": -1
    },
    {
      "accuracy": -1,
      "confusionMatrix": {
        "categories": [],
        "computedConfusionValues": [],
        "values": []
      },
      "endTime": "2025-02-25T11:53:02Z",
      "f1": -1,
      "precision": -1,
      "recall": -1,
      "specificity": -1
    },
    {
      "accuracy": -1,
      "confusionMatrix": {
        "categories": [],
        "computedConfusionValues": [],
        "values": []
      },
      "endTime": "2025-02-25T11:53:12Z",
      "f1": -1,
      "precision": -1,
      "recall": -1,
      "specificity": -1
    },
    {
      "accuracy": -1,
      "confusionMatrix": {
        "categories": [],
        "computedConfusionValues": [],
        "values": []
      },
      "endTime": "2025-02-25T11:53:22Z",
      "f1": -1,
      "precision": -1,
      "recall": -1,
      "specificity": -1
    }
  ]
}

PreviousCalculating Metrics NextRegression Metrics

Last updated 6 months ago

Was this helpful?