Introduction to Automated Machine Learning

Let’s talk about automated machine learning or what is often called AutoML. First, let’s talk about what automated machine learning is NOT, as the attention it receives is often misguided. If you listen to the general news, you can find some of the following statements:

“MIT’S automated machine learning works 100x faster than human data scientists”

“Automated machine learning puts analytical models on autopilot”

“Google’s AI-Building AI is a step toward self-improving AI”

Reading these statements, it seems that we have reached a point where we can sit back and let computers solve our most complicated problems. Some even suggest that data scientists and machine learning engineers have become redundant. However, the people currently building machine learning applications will likely tell you that AutoML is powerful but cannot yet compete with custom jobs. Who is right, then?

What is Machine Learning?

Before taking a closer look at what AutoML is, we need to define what machine learning is trying to accomplish in the first place. When you want to build a machine learning function, you need to take the following steps:

Define a parameterized function that makes predictions.
It means that a given input should produce an output based on parameters. Some of these are learned, and the others are fixed parameters called hyperparameters. When it comes to the functions, think of neural networks, Gaussian Processes, decision trees, or any machine learning model you might have heard of. At a high-level, they can be seen as black-box functions.
Define a utility function that evaluates the performance.
The core of machine learning is to learn to be good at a task. Utility functions can be anything from calculating the accuracy of a model given an independent test set to including computing costs, interpretability, and any other additional heuristics and business constraints. The goal is to be able to provide a single number that shows the quality of the function.
Learn the parameters that optimize performance.
It is where the magic happens and what machine learning research is all about. It deals with building models and learning mechanisms that efficiently find the parameters that optimize the utility function.

What is AutoML?

So, what is the difference between automated and traditional machine learning? Technically, the only difference is that some of the parameters that used to be fixed now become learnable. That’s it. Nothing fancier than that. At its most basic core, AutoML could do an exhaustive search of possible hyperparameters and models. Isn’t that brute force? Yes. Is that a problem? No. Here are 3 reasons why:

Besides sizeable deep learning models, plenty of learning mechanisms can be trained within a reasonable amount of time and for quite cheap as ready-to-go implementations exist.
As you spend time designing the best custom model for your use case, plenty of experiments could run in the background. It can provide valuable guiding thoughts on what works well and what doesn’t work so well. Especially at night or during lunchtime. Thank you cloud!
If you never thought of building a simple script that would run through multiple models and hyperparameters, you’re either lying or you might want to rethink about it again and reconsider this in your daily job as a data Scientist.

Building machine learning models is an experiment-driven science. You cannot know for sure what will work well. You need to design experiments, run them, and analyze the results. If any given experiment is expensive to run, careful considerations need to be taken as not to waste resources if results are expected to be underwhelming. Now you might be thinking: “Sure, AutoML is pretty much random search and grid search, what’s the big deal? We’ve been doing that for a while”.

The big deal is that, nowadays, there exist many more sophisticated ways to automate building machine learning models. Random and Grid search have substantial practical implications as they are incredibly simple to implement; however, they are not the most sophisticated learning mechanisms. Indeed, the search is fixed and does not learn from previous results. Nowadays, better meta-learning mechanisms exist. Let’s talk about them!

Evolutionary Algorithms

Inspired by Darwin’s evolution, evolutionary algorithms work as follows:

Generate a population (in our case, machine learning models with certain hyperparameters)
Calculate the utility score (“fitness”) for each individual in that population (accuracy, F1-score, MSE, etc.)
Survival of the fittest (only keep the best performing models)
Generate more individuals based on the characteristics of the survivors (mating, crossover, mutation, etc.)
Repeat step 2-4 until satisfactory results

TPOT is a popular Python package built on top of the scikit-learn API, which implements AutoML via an evolutionary algorithm.

Gradient-Based Optimization

Gradient-based optimization can be quite straightforward for those familiar with the gradient descent algorithm. In a nutshell, there are not many things one needs to know to understand how this works. Implementing gradient-based AutoML boils down to the following steps:

Sample a set of hyperparameters
Train a model with these hyperparameters
Get the utility score of the model
Given the gradient of the utility score, select a new set of hyperparameters that will improve the utility score.
Repeat step 2 to 4 until results are satisfactory

The complexity, or “magic,” as some people like to call it, comes from calculating the gradient efficiently and building differentiable utility functions.

Bayesian Optimization

Let’s assume that you are a boxer and want to find out where you rank in the world. You have some confidence in your abilities. You can say whether you believe you can win or lose against any opponent. You also have a manager, and their role is to make sure that you end up at the highest possible rank in the world. To do that, they need to make sure that you have easy fights to secure a good record, but on the other hand, they need to ensure that you fight boxers ranked higher than you because that’s the only way you will move up.

This is what Bayesian optimization is in a nutshell. Two functions are used in parallel to establish which new set of hyperparameters have to be experimented with. The first one, the surrogate function, estimates how good the model can be if it is trained with a specific set of hyperparameters. It is similar to the confidence a boxer has that they will win a fight. The second one, the selection function, ensures that the set of hyperparameters is likely to improve the model. Indeed, there is no point in trying parameters that are almost certain to provide lower performance. This is like a manager, ensuring that a boxer has winnable yet challenging fights.

AutoML via Bayesian optimization looks like this:

Sample a set of hyperparameters
Train a model with them and train the surrogate function
Select hyperparameters based on the surrogate and selection function
Repeat 2-3 until satisfactory results are reached.

A significant advantage of Bayesian optimization is that it can be applied to any machine learning model, as opposed to gradient-based approaches, for instance. It is why this is the most popular AutoML approach so far in terms of open-source solutions (AutoML, Auto-Sklearn, Auto-Keras, HyperOpt, GPyOpt, etc.)

Will AutoML Replace Data Scientists?

No, AutoML will not replace data scientists, at least, not the good ones. Even if research found a way to build the perfect AutoML solution, which it hasn’t and unlikely will, a data scientist’s job doesn’t only consist of building models. A good data scientist does much more than that. They assess the relevance of a solution; they help build tools to monitor it, improve it, and integrate it into existing systems. They help meet non-functional requirements such as speed, explainability, robustness. They help design solutions and data pipelines that will ensure the models they have built will continually improve and provide upside to the end-user.
If your job is to build models, then, AutoML should be part of your toolbox! It is a remarkable way to establish reasonable lower bounds for performance quickly and cheaply. You might be able to outperform it by creating a very custom solution, and if you do, that’s great. But don’t try to be a hero, stick to the basics first, and AutoML is a great way to do so.

This blog is written by Valentin Calomme, AI Engineer at Mediaan.