For instance and semantic segmentors as well as object detectors, a confusion matrix is calculated by first checking if the predicted class is the same as in the ground truth, and then if the IoU is above a certain threshold. Often, 0.5 is used.
Whereas accuracy is very intuitive, it has one drawback: the accuracy doesn't tell you what kind of errors your model makes. At 1% miss-classification rate (99% accuracy), the error could be either caused by false positives (FP) or false negatives (FN). This information is important when you're evaluating a model for a specific use case, though. Take COVID-tests as an example: you'd rather have FPs than FNs.