Last time, we discussed how computer vision can learn everything we see and what to do if little data is available. Now that we have enough data, the next step would be to determine what type of model we want to train. The questions are: “How long do we have to train the model?” and “How do we know whether we train it well enough?”. In this blog, we will show you how to check the performance of a model, along with how to quantify this performance to prove that a new model is trained better than its predecessor.


The model’s detections can be categorized as the following: true positive, true negative, false positive and false negative. The first two categories mean that the detection was done correctly. The last two categories are a bit trickier. False positive means that the model detects something where there is nothing, whereas false negative is when the model fails to detect the object(s) on the image.

On the image you can see a confusion matrix for a classification model. When looking at the results on the matrix, it is very important that they show many true positive and true negative predictions, as it means that the model predicts the correct labels for the images. To make it easier to understand, let’s say that we want to train the model to detect persons who wear protective helmets. 

  • True positive – the model correctly detects persons wearing helmets.

  • True negative – the model correctly detects persons wearing no helmets.

  • False positive – the model detects persons wearing helmets, but in fact they aren’t wearing helmets.

  • False negative – the model detects persons wearing no helmets, but in fact they are wearing helmets.

The previous example was for binary classification, where we only have two classes (true vs. false), but a confusion matrix can also be used when there are more classes. The figure below shows a confusion matrix for 10 classes.

When to stop training your model?

Now we know how we can check if a model performs well. It’s not time to celebrate yet! The next question would be, what data should we use to test the performance of the model? When we use data that has been previously used for training, the model would obviously show great performance. However, it does not guarantee that it will do well in real-life performance. This is what we called “overfitting”. It usually occurs when we have too little training data, or when we train the same data for too long. It means that the weights of the model are too optimized for the training data. So how do we fix this? By just stopping the training earlier, right? Yes, but when is “early enough”? Also, if we stop too early, we would get what we called “underfitting”, which means we have not trained the model long enough.

A way to counter underfitting or overfitting is to take some data that we have annotated out of the training loop (many other techniques are being used to counter under-/overfitting, but we’ll keep that for the next blog). These unseen images for the training are now the test set. We are going to use this data to check how well the model works by processing these images. The output of these images can be visualized in the confusion matrix, and now we can see how well the model is generalized (how well it works on unseen data). What if, after some time, we notice that the model does not work well? Do we have to restart the complete training? No! We can always retrain the model from the last checkpoint with some extra data.

underfitting, appropriate fitting, overfitting


As you can see, training and evaluating the performance of a model is not as straightforward as it looks at first glance. And as always, there is no one-size-fits-all technique for every situation. At Mediaan Conclusion, we always try and experiment with different techniques to train the model as good as possible for your use case! Not sure how to start your computer vision adventure? Our experts are always ready to help! Have a look at our related blogs to find out numerous opportunities that computer vision can offer to your company: