Within machine learning there is a learning form called supervised learning. But what does “supervised” mean? Are there any special considerations you need to make when doing supervised learning? Read on and get a introduktion to supervised learning.
In our previous blog post we have introduced machine learning as:
" an umbrella of a specific set of algorithms that all have one specific purpose to learn to detect certain patterns from data."neurospace
In order to learn to distinguish these patterns, think of colors or shapes, a training must take place, where a machine learning algorithm is introduced for the given pattern so that it can learn to recognize it.
There are, roughly speaking, three different learning methods that we can use when training a machine learning model: Supervised learning, Unsupervised learning, and Reinforcement learning. In this blog post we will dive into what Supervised Learning is, when you can use Supervised Learning and what requirements there are to your data?
Supervised Learning: A practical example
Do you remember the time you first learned to multiply?
Or when you learned to drive a car?
In both cases, you have no doubt been accompanied in your learning, by a supervisor, e.g. a math teacher or a driving instructor.
When we as humans have to learn a whole new thing, we are often accompanied in our learning, by a more competent person (a supervisor).
This is the same thing we do in supervised learning.
If you help to obtain data, it is you who helps to supervise the algorithm.
You supervise the algorithm by forming a “label” so that the algorithm is accompanied in what patterns it must learn to be able to recognize and distinguish from each other.
Already now, you have the first requirement for your data if you want to use supervised learning: There must exist a label.
In the image example, we want to make a machine learning model that can estimate whether there is an apple or a banana on a conveyor belt. We must therefore use pictures of both apples and bananas, and for each picture, it must be stated whether it is an apple or a banana that is depicted. Once the machine learning model is trained, it can then be used to make predictions on new data that does not have a label. The important thing is that there is a label present during the training of the model, otherwise you cannot do supervised learning.
What problems can we solve with Supervised Learning?
There exist two types of problems that can be solved with Supervised Learning:
- Classification
- Regression (predictions)
Classification
An example of Classification was given in the practical example of which we would like to know the difference between cupcakes and apples, or apples and bananas.
You can also be more specific in your classification and ask a machine learning model to detect the sort of a banana and the sort of an apple, just as we have given an example of in previous blog post about estimating 101 different sorts of fruit.
Classification can also be used to determine the cause of an unplanned downtime, or to detect if a person has cancer in, for example, the breast.
You have no doubt come across solutions with supervised learning
Every time you use Google Assistant or Siri, you help the ecosystem get better.
You are supervising Google Assistant and Siri to become better at helping not just you, but all users who use the solution.
The same is true if you use google translate.
We can probably all remember just 10 years ago how many mistranslations there were in Google Translate from Danish to English compared to today.
This is because all of us who uses google translate helps improve the translation by suggesting corrections to the translation.
Listed below are a few other places where you can help the ecosystem and act as a supervisor:
- Google Maps
- Brobizz “Pay by Plate”
- Police, when they scan your license plate when passing you on the road
- Speech to Text
- Text to Speech
Finally, I will highly recommend you to try Google Quick Draw. It is an easy and intuitive way to understand how supervised learning works.
Regression
All regression problems are about predicting something.
Predictions can be how many degrees will be in the shade tomorrow, how many days are left before a crash occurs, or what your dream house will cost.
It is within regression problems, that many companies can draw huge advantages from, when using machine learning.
Regression models also require you to have a label. In some cases, such as predicting the weather or a heat consumption a label will be more natural and easy to retrieve as it can be created programmatically. It is simply possible based on the data that is already present to create a label with a little software. If, on the other hand, we want to predict the price of a house, we will have to have the actual sales prices of houses in the area we want to predict and preferably as new sales as possible, as we know that the price will change over time.
Practical Examples
At Kredsløb, we used regression models to predict unwanted situations in the transmission system.
In previous blog posts, we showed how we can use regression models to predict unplanned downtime on water pumps and on aircraft engines for NASA.
Finally, you can use regression models to predict demand and thus plan production according to the actual need.
Tips for Supervised Learning
If you have a problem that matches the examples above, you will most likely need Supervised Learning. If you are going to start a supervised learning project, you can already make it easier today by preparing your data. Here are our top tips for getting started with supervised learning:
- Compliance For example, if you are interested in predictive maintenance, you must already today update your maintenance journal with all the information that is important about your machines: when does a breakdown occur, when have we performed maintenance, have we upgraded to newer machines maybe from a different brand, etc.
- “Other” Category: Avoid having an “other” category in your classification label. An “other” category is cluttered and often does not contain one specific pattern. Finally, we often find that if a company has an “other” category, it often accounts for more than 50% of the data. There is not much value in knowing that it was “something else” that caused the breakdown.
- Clear Classes: If you are interested in being able to classify something from each other, it is important that your labels are correct, and also each class is clearly different from the other classes. So you must not have a class that contains a little of everything.
- Right Data: Make sure today that you gather the right data for the problem you wish to solve.
// Maria Hvid, Machine Learning Engineer @ neurospace