Computer Vision: What To Do If There Is Little Data?

In the previous blog How Computer Vision Can Learn Everything That We See, we went over what type of data an image is and how we can interpret it. We also talked about how to label images and how this is used to train a model. But what if there is little to no data available? Is there a way to generate new data from existing images? And when do we have enough? Thankfully, we can solve this problem by using data augmentation, a technique used to alter images and create new trainingsdata (images). Some examples are scaling, rotating, stretching, and many more advanced techniques. In this blog we will go over a few of these techniques and explain how they work and when to use them.

How does data augmentation work?

When starting with the first model trainings, there is often insufficient data available. The reasons are maybe because the cameras have just been installed or there are not enough image variations in the data (e.g. only day images but not night images). To handle this limitation, we want to generate extra data to have enough training in order to create a robust and accurate model in varying environments that will occur in practice.

Now let’s talk about numbers! If we use 7 techniques (explained below), we can generate 7 new images for each existing image. It happens when we only apply each technique just once. However, we can apply each technique several times with different settings. For example, when using rotation we can turn the image 90 degrees clockwise, but also 45 degrees, or 180, and so on. As you can see, we can generate a lot of new images from a single image. The question is how many images do we need to generate from each image? And when do we have too many images generated from the original image? This all depends on the application, not all data augmentation techniques work for all applications. Maybe we do not want to rotate the image because we are confident that the object is always oriented in the same way.

Technique overview

Cropping

The first technique is cropping; the program cuts out a part of the image and makes it a new trainings example. This technique is useful in object detection (for example detecting all the pigs on the frame) when the object is not entirely in the frame. Here is the question, do we want to detect partial objects or do we only want to detect the complete objects? For the model it is important to have at least one example for each object to detect, including when the object is not completely visible.

Scaling

Scaling is another useful technique. This technique works by removing (subsampling) or adding pixels (supersampling) to the image. Using this, the model becomes more generalized. This means that the model adapts to new data and is more likely to work well when processing yet unseen data in production use as compared to during the training. The model can now detect these objects when they are further away or closer to the image. It is also capable of detecting all objects when we move to a new camera with higher or lower resolution.

Color

You can also generate extra data by changing the color space, exposure, or contrast. A well-known example is converting to grayscale. It is also possible to remove color channels from the image, resulting in br, bg, or rg color spaces. The reason for that is, so the model is not dependent on the colors used in the images. By changing the color space of the training images, the model can work in a lot of different settings and illuminations. The model becomes more robust and can also work with night vision images, or when there is either more or less light.

Stretching (elastic form)

Stretching is the technique where either the image width or height is changed to get a new perspective of the data. The image gets stretched in one direction. The results are like stretching a rubber band. But why would we use this technique? This technique can be used to make the model handle distorted images. Sometimes security cameras change the perspective or have a fisheye effect. To make sure that the model can handle this kind of images, you can add stretched images to the training set.

Rotation

The image can be rotated, clockwise, or counterclockwise. This can be very useful when we have many images of objects that are all oriented in a same way. By rotating the images, the model no longer cares if it is in the same orientation as the trainings images but can also detect objects when rotated in the frame.

Noise

The recording settings for the images are not always perfect in a production environment, think about poorly lit scenes. Here we have bad interpretation of the scene, but we still want that the model can find the data we want. For a human, it is still easy to understand what is visible on the image, but for a computer, this is something completely new! To counter this and make the model more robust we can generate images to introduce random noise.

Blur

What if the image is not sharp because the camera was not focused? We still want the model to find detections. In this technique, we also add random blur to the images.

Is data augmentation limited to computer vision?

The answer is no. Data augmentation can be also be used to generate new data in other machine learning applications. At Mediaan Conclusion we always use data augmentation when (re)training a model. In the case of computer vision, it’s the more the better. The more different images we can put into the model, the better it will be trained and the more capable it will handle new, unseen situations. From initial training, where we use limited data augmentation, to the final model. Where we optimized the data augmentation for the use case, we usually see a significant increase in accuracy.

In a concrete case with farmer Piet, we use rescaling, rotation, color, and exposure (brightness) augmentations. Using these techniques, we went from having 200 images to 2000 images. Without data augmentation, the accuracy was only 59%. However. when we used data augmentation, the accuracy went up to 83%! An increase of 24% for the same number of original images! Great, right?

Conclusion

When it comes to applying data augmentation, there is no one-size-fits-all. Which techniques can be used depends heavily on the type of application and camera setup. At Mediaan Conclusion, we always start with a base set of data augmentations that we know will work. We optimize the generation of new training images through experiments to find the best solution for the given use case.

Ready to give computer vision a shot and get the most out of your security camera? We are always ready to help and find a solution YOU want! Still not completely convinced? Take a look at our 4 Easy & Simple Steps To Start Using Computer Vision blog and find out how we can help you gain extra insights for your business by using computer vision. Have a look at our other related Computer Vision blogs: