In this blog post you will learn what machine vision is and how a model can be trained to detect objects in images. Machine vision or computer vision is the field of making machines see and detect objects within images and videos with a certain confidence level.

Cutting board with cut fruit

There are many applications for machine vision especially within manufacturing. It is already used for various quality inspection tasks such as inspecting molding, finding cams in wood, detecting the quality of welding, making robots see and interact with objects, and detecting unwanted objects in the assembly line.

To demonstrate how machine vision works and to kickstart your imagination of how it can add value within an industrial setting, we will demonstrate a machine learning model, that can detect different types of fruits.

Machine Learning for sorting Fruits

Imagine that you have an industrial fruit farm which produces various types of fruits. All fruits are collected by robots and stored together in your storage facility. Before packaging the fruit for sale it has to be sorted. You want to sort bananas from pears from apples from grapes and so on. But to complicate this task there are also different sorts of bananas, apples, and grapes and you want to be able to sort these as well. An example is that you want to sort red bananas in one type of packaging and yellow bananas in another. The same goes for white versus red grapes and so on.

What if you by using a low cost embedded system and camera could create a machine capable of doing this task for you in an automated way? A Convolutional Neural Network (CNN) algorithm might just do the trick

Convolutional Neural Networks

Convolutional Neural Networks (CNNs) are a type of deep learning architecture which is commonly used within machine vision. The CNN algorithm is inspired by the human’s visual cortex, and is an alteration between stacked convolutional layers and spatial pooling layers. In that way, each pixel of a picture is evaluated and used when classifying whether a given picture belongs to one class or another.

Machine vision is the science that aims to give a computer system a similar, or better visual classification capability than humans [1]. CNN’s are a robust type of networks as the images requires minimal preprocessing before training can begin, and have potential to perform well on smaller datasets. You will need pictures of all the categories you wish the machine vision model to identify. In our case, if we wish that our model can distinguish bananas from apples and vice versa, we must introduce the model to pictures of both apples and bananas, in different angles, different sizes, different lighting, and different shapes. It is additionally crucial that the data is labeled as either an apple or a banana, as the convolutional neural network algorithm is a supervised learning approach. Labeling the data means that you must name each picture as either an apple or a banana, or create a folder with all apples in it, the same goes for bananas, grapes and so on. Thereby, you are the supervisor for the training session when creating the machine learning model.

Introduction to the dataset

The dataset provided is from Kaggle and contains pictures from 101 different types of fruits. In total, we have n = 69,905 pictures, of which 52,262 is used for training, and 17,540 pictures is used for test [2]. Also, different varieties of e.g. apples and bananas are labeled different, so our model is supposed to be able to predict whether the apple is of the variety Crimson Snow, Golden, Golden-Red, Granny Smith, Red, and Red Delicious. Now, just to visualize some of the complexity we are dealing with, below is a picture of two different types of bananas. Can you see the difference between these bananas? One of these is a normal banana, while the other one is of the variety Banana Lady Finger. It is indeed difficult even for humans to identify these when they are visualized from these angles.

Banana of the variety Baby Fingers

Lady Finger Banana

Banana

Banana

Before starting training and testing the model we will preprocess the images. We rescale the RGB channels in the images from 0-255 to values between 0-1 as this is easier for the model to process, reduce the probability of vanishing gradient, and is faster to train.

Preprocessing images
1
2
3
4
5
6
train_datagen = ImageDataGenerator(rescale=1/255,
                                   shear_range=0.2,
                                   zoom_range=0.2,
                                   horizontal_flip=True)

test_datagen = ImageDataGenerator(rescale=1/255)

Both shear_range and zoom_range are set to randomly change the angle and the zoom of the images between 0 and 0.2. Because the images are taken in nearly perfect conditions and our sorting machine may see the fruits differently the training data is manipulated randomly to prevent overfitting and to make it more like production conditions. The ImageDataGenerator does this all automatically which makes for a more robust model with little effort.

We then design our Convolutional Neural Network:

Convolutional Neural Network architecture
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense

model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=(100, 100, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(32, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(units=128, activation='relu', use_bias=True))
model.add(Dense(units=101, activation='softmax'))

The last layer has 101 neurons, one for each class of fruits, and uses the activation function softmax which produces a confidential level similar to probabilities, where it sums up to 1.

In our machine vision model, we get a train accuracy of 98.39% and a loss of 0.0495. Additionally, we get a test accuracy of 98.39% and a loss of 0.0590.

Tensorflow Lite

The last thing that we do, is to convert our model in to a Tensorflow Lite version, so it can be deployed on an embedded device such as Google Coral Dev Board, and run in production:

Tensorflow Lite
1
2
3
4
5
from tensorflow.lite import TFLiteConvereter

converter = TFLiteConverter.from_keras_model_file("model.h5")
tflite_model = converter.convert()
open("model.tflite", "wb").write(tflite_model)

For those of us who are not fruit farmers

Now we have shown how fruits can be sorted by using machine learning. We are not fruit farmers and the chances of you being a fruit farmer is also unlikely. But we have good news as there are various applications for machine vision in an industrial setting. Examples could be to sort trash, detect if raw materials are suited for further processing, quality control of finished products, and detecting unwanted objects on the assembly line. CNNs can be used to automate operations and quality inspection in your assembly line.

// Rasmus Steiniche, CEO @ neurospace

References

[1] Khan et al (2018) A Guide to Convolutional Neural Networks for Computer Vision. Morgan & Claypool

[2] Horea Muresan, Mihai Oltean, Fruit recognition from images using deep learning, Acta Univ. Sapientiae, Informatica Vol. 10, Issue 1, pp. 26-42, 2018.

Credit for dataset: Horea Muresan, Mihai Oltean, Fruit recognition from images using deep learning, Acta Univ. Sapientiae, Informatica Vol. 10, Issue 1, pp. 26-42, 2018.