Confusion Matrix

Not really a metric, but fundamental for most metrics related to classification

A confusion matrix is not really a metric, but many metrics are calculated on top of it because it gives a great indication of the model's performance. This is why it's important to know it before diving into the metrics themselves.

A confusion matrix is calculated by comparing the predictions of a classifier to the ground truth of the test or validation data set for a given class.

https://www.dataschool.io/simple-guide-to-confusion-matrix-terminology/#:~:text=A%20confusion%20matrix%20is%20a,related%20terminology%20can%20be%20confusing.

Interpretation / calculation

The rows and columns are divided into 'no' and 'yes', indicating belonging to the class in question. The rows are for the data-points in the ground truth, and the columns the ones for the predictions.

For classifiers, a confusion matrix is calculated by simply assigning the count of prediction/ground truth combinations to this table.

For instance and semantic segmentors as well as object detectors, a confusion matrix is calculated by first checking if the predicted class is the same as in the ground truth, and then if the IoU is above a certain threshold. Often, 0.5 is used.

In the example above, the classifier made 165 predictions in total. 60 samples of the ground truth were negative, 105 positive. On the other hand, the model only predicted 55 samples as negative and 110 as positive. Ideally, the numbers would be the same.

From the confusion matrix, we can derive four types of predictions. The ones we want to see:

  • True positives (TP): samples which the model predicted as belonging to the class in question and which actually belong to the class according to ground truth.

  • True negatives (TN): samples which the model predicted as not being part of the class and which are negative in the ground truth as well.

And the ones we don't want to see:

  • False positives (FP): samples which the model predicted as part of the class but which actually aren't (Type I Error).

  • False negatives (FN): samples which the model predicted as negative but which actually belong to the class in question (Type II Error).

The absolute numbers of a confusion matrix are not straightforward to interpret, but as mentioned above, they are used to calculate more interpretable metrics. Check the further resources section for more details on the specific metrics.

Code example

PyTorch
Sklearn
TensorFlow
PyTorch
!pip install torchmetrics
import torch
import torchmetrics
from torchmetrics import ConfusionMatrix
target = torch.tensor([1, 1, 0, 0])
preds = torch.tensor([0, 1, 0, 0])
confmat = ConfusionMatrix(num_classes=2)
confmat(preds, target)
Sklearn
from sklearn.metrics import confusion_matrix
y_true = [2, 0, 2, 2, 0, 1]
y_pred = [0, 0, 2, 2, 0, 2]
confusion_matrix(y_true, y_pred)
TensorFlow
# importing the library
import tensorflow as tf
# Initializing the input tensor
labels = tf.constant([1,3,4],dtype = tf.int32)
predictions = tf.constant([1,2,3],dtype = tf.int32)
# Printing the input tensor
print('labels: ',labels)
print('Predictins: ',predictions)
# Evaluating confusion matric
res = tf.math.confusion_matrix(labels,predictions)
# Printing the result
print('Confusion_matrix: ',res)

Metrics based on the confusion Matrix