In our “4 Quick & Easy steps to start using Computer Vision” blog, we introduced you to 4 easy steps of getting started with Computer Vision. Today we will take a deeper look into the very first step “Structure your visual data” or what we often call “labelling”. Let’s find out why it is important to label unstructured data, how to do it properly and what benefit you can get from it. 

Try describing some pixel in an image…

You probably know the saying: “What you see is what you get”. But can you easily explain and define what you see? When building software solutions that automate the manual and human labor processes, the challenging part is handling visual input data from images, videos, and live streams. With traditional software solutions and common ‘if-this-then-that, else …’ logic, it can be difficult to handle the complexity of unstructured visual data. If you want to build a software solution, you need to be able to tell what logic should be applied.

Just tell what you see!

Luckily, with modern Computer Vision technologies, we found ways to get rid of these explicit “if-then-else” definitions. We can work from real-life images and videos and simply mark and label the things we see in the image/video frame. We can let the different algorithms do their thing to train itself and make that definition of the objects for us. The resulting ‘smart algorithms’ (the machine learning models) can handle a lot of variations of the objects that are visible in the images/videos. It is often astonishing how precise the detections by these algorithms are.

Labelling objects

Now let’s dive into the very first step of getting started with Computer Vision: “Structure your visual data”, or what we call “labelling”. We do this by marking an area in the image or video frame. Then we are labelling it with a class of what to name what is visible in that area. Let’s use one of our Computer Vision projects as an example, where we learned our algorithms to detect a pig. When labelling these are the steps that we usually follow:


We start with capturing some videos of what the cameras can see. This includes some moments of the stable being empty (no pigs visible), or one or more pigs in their day-to-day behavior walking around, sleeping, eating, etc.


Using a Labelling Tool we define the classes/labels we are interested in. In this example, specifically the pigs’ tails. The reason is that want to analyze whether the tails are wagging or not, and how that correlates with some pigs’ behavior. When labelling, we simply define “PIG” and “PIG TAIL” as 2 classes/labels and mark an area in the image in the shape of a rectangle.


Then we apply a label to each one of these rectangles. We do this for quite some images/frames in the videos, enough to capture some variations in the pigs’ positions, sizes, lighting conditions, etc. A few minutes of video is often sufficient to get started.

Definition and detection magic

These labelled images are used as inputs for the algorithms to train themselves, learning to create that definition of what a ‘pig’ is and what not, without us humans having to describe or code anything more about the pig. We only marked some areas visually in the images, and the algorithms do the rest. And the results are amazingly accurate in a very short time (within a few hours/days).

Focus on your use cases, not the technicalities

So, it basically is as simple as “What you say you see, is what you get”. Of course, there are some technical steps that Data Science and Machine Learning experts need to take in order to select the best performing algorithms. But this technical complexity is becoming more and more automated itself so that we can focus on the actual use cases and opportunities of adding Computer Vision technologies in our Business Processes. Imagine having to analyze high-speed videos in large volume 24/7, like found in the following situations:

  • Doing quality checks on an assembly line.

  • Observing areas for suspicious behaviors of people at secured sites like solar fields or seaports.

  • Detecting demographics of people including the age and gender of persons in retail stores.
  • Or… even monitoring the wagging of pigs’ tails, if you like.

Ready to try it yourself?

There are actually no theoretical limits to what is possible using technologies like Computer Vision. Computers don’t need sleep and can be scaled with the push of a button. Humans do need sleep and are subjective. We inevitably make errors at some point. So, what use case for Computer Vision are you thinking of in your business or organization? Don’t hesitate to contact the experts at Mediaan for developing a fully production-ready Computer Vision system, or as a first step convince you of the possibilities during a 1-month Proof of Value in these 4 quick & easy steps. Check out our other related Computer Vision blogs here:

This blog is written by Micha Verboeket – Computer Vision Product Manager and Director Mediaan Hasselt.