Quality metrics for bounding boxes

In this chapter, we’ll explore the different metrics that could interest us.

Precision

The precision is a metric that tells us how good is the our model detecting the objects and distinguish them from the background or other classes.

$$ \dfrac{TP}{TP + FP} $$

So, we compute it dividing the True Positives (correct detections) with the sum of the correct detections and the bad detections.

If you observe, this metric doesn’t represent if we detect all the objects, just if the few we have pointed are correct or not. If we point 10 out of 100 objects that we should have detected in the class A, but don’t miss none of these 10, we have a precission of 100%! So, this is why we use Recall, in this example would be 10%.

Recall

The recall is the metric that tells us how good is the model detecting all the objects of interest instead of detecting THE object we are interested in.

$$ \dfrac{TP}{TP + FN} = \dfrac{TP}{GT} $$

It is computed as the division between the correct detections and the correct detections plus the detections we missed. This is equal as correct detections divided by the number of ground truths we have.

Note that this metric doesn’t represent if we detected something that isn’t an object, so, if we have 100 objects but we detect 1000 objects, in this case, correctly detecting the 100 we should detect, we hace a Recall of 100%! This is why we use precission too. In this example the precision would be 10%.

Mean Average Precission 50-95 - Quality of BBoxes

Average Precission is the area above the Precission - Recall curve. To compute AP we will need to compute the Precission and Recall over all the predictions the model has made in all the images of the dataset.

First of all, for optimizing the compute, we’ll associate all the predictions in a dictionary with the object class as key and a list of tuples prediction bbox - image - confidance. And a dictionary of object class that contains a dictionary of image id that contains a list of Annotations of this image.

So, to compute the AP of one class, we select the corresponding list of tuples with the predictions, confidance and images and order it by descending confidence.