Insect Image Classification Using CNN

Introduction

This blog is to understand the Python implementation of Convolutional Neural Network (CNN) through the insect classification example. CNN is one of the most widely-used deep learning methods in the field of image and video recognition, recommender systems, image classification, natural language processing etc. In this blog, we are going to apply CNN via tensorflow library in Python to classify insect images from beetle, cockroache and dragonflie categories. The dataset applied in this post can be obtained here. And the Python coded can be found at my Github repository.

Data

The original dataset is from this website. In this post, we only used a small fraction of them which contains beetle, cockroache and dragonflie images. Before the formal data manipulation, we can firstly gain an overview of the relevant information.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import os
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import tensorflow as tf

labels = ['beetles', 'cockroach', 'dragonflies']

for i in labels:
    path_train = './insects/train/' + i
    path_test = './insects/test/' + i
    print(f'Number of {i} images in the training set:', len(os.listdir(path_train)))
    print(f'Number of {i} images in the testing set:', len(os.listdir(path_test)))

Number of beetles images in the training set: 460
Number of beetles images in the testing set: 60
Number of cockroach images in the training set: 240
Number of cockroach images in the testing set: 60
Number of dragonflies images in the training set: 319
Number of dragonflies images in the testing set: 60

fig, axs = plt.subplots(3, 3, figsize = (12, 12))
axs = axs.ravel()

for i in range(len(labels)):
    path_train = './insects/train/' + labels[i]
    for j in range(3):
        np.random.seed(j)
        axs[3*i+j].imshow(mpimg.imread(path_train + '/' + np.random.choice(os.listdir(path_train))))
        axs[3*i+j].axis('Off')
        axs[3*i+j].set_title(labels[i])

error

Then the 1019 training images and 180 testing images can be imported from the corresponding folders through the in-built functions of tensorflow. The imported datasets are normalized to improve model performance.

# normalize
train_datagen = ImageDataGenerator(rescale = 1/255)
test_datagen = ImageDataGenerator(rescale = 1/255)

# import data from directory
train_generator = train_datagen.flow_from_directory('./insects/train/', 
                                                    target_size = (256, 256),
                                                    batch_size = 32,
                                                    class_mode='sparse')
test_generator = test_datagen.flow_from_directory('./insects/test/', 
                                                  target_size = (256, 256),
                                                  batch_size = 32,
                                                  class_mode='sparse')

The resulted data has shape:

DataShape
train_generator(32, 256, 256, 3)
test_generator(32,)

The values of shape means that the batch size of the training and testing data is 32, and each image is represented by a 256 x 256 x 3 matrix. 3-channel matrix means that the images are in RGB format, while each channel is indicated by 256 x 256 pixels.

The two datasets train_generator and test_generator will be applied to train the CNN model and evaluate its performance based on classification accuracy.

CNN Model

The whole model is constructed through tensorflow library. The python implementation is:

model = tf.keras.models.Sequential([tf.keras.layers.Conv2D(16, 3, activation='relu', padding='same', input_shape=(256,256,3)),
                                    tf.keras.layers.MaxPool2D(),
                                    tf.keras.layers.Conv2D(32, 3, activation='relu', padding='same'),
                                    tf.keras.layers.MaxPool2D(),
                                    tf.keras.layers.Flatten(),
                                    tf.keras.layers.Dense(128, activation='relu'),
                                    tf.keras.layers.Dense(3)])

model.compile(optimizer = 'adam',
              loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics = ['accuracy'])

history = model.fit(train_generator,
                    epochs = 16,
                    validation_data = test_generator)

The basic structure of the classifier is:

error

Convolution Layer

Conv2D is the convolution layer. Its main objective is to extract features from the imported images while preserving the relationship between pixels. To start with, let’s analyze the machenism from the perspective of a single channel. In this case, the major objective can be realized by summarizing the information of the submatrix of images with the usage of filter, which is a small square matrix. The summarizing process is to multiply the filter matrix with the image submatrix, and it can visualized as below.

error

With this procedure, a new problem appears: how will the submatrix of the image be determined? We know the submatrix should have the same size as the filter matrix based on the multiplication rule, but do we need to explore on every submatrix within the image matrix? These problems can be solved with the stride parameter. Stride is the number of pixels shifts over the input matrix. If the stride is 1, the filters will be moved by 1 pixel at a time. The moving process of a 3 x 3 filter when stride is 1 can be shown as:

error

Extending this machanism into the 3-channel image case, I specify my Conv2D layer with 16 filters with 3 x 3 size, and keras, counter-intuitively, configure each filter as 3 x 3 x 3. Thus each 3 x 3 x 3 cube in the image matrix will result in a singe value for each filter. Since the stride of this layer is 1 by default, with input size 256 x 256 x 3, the output shape is expected to be 254 x 254 x 16 based on previous analysis. However, the resulting shape is actually 256 x 256 x 16. This is because the parameter padding is set to be ‘same’. Therefore, the 3-dimensional hypercube will be surrounded with zeros and the output matrix will have the same size as the input matrix when stride is 1. If padding is set to be ‘valid’, the output shape will be extractly 254 x 254 x 16.

As for the number of parameters in this layer, since there are 16 filters, each with 3 x 3 x 3 weights plus 1 weight for the bias, the total number of parameters will be (3 x 3 x 3 + 1) x 16 = 448.

Pooling Layer

MaxPool2D is the pooling layer to reduce the number of parameters when the image is too large. There are three types of pooling:

  • Max Pooling
  • Average Pooling
  • Sum Pooling

They represent the calculation method within each submatrix. For example, with 2 x 2 filters and stride 2, the max pooling and average pooling work like this:

error

In the situation of our MaxPool2D layer, the default pool size is 2 x 2 and the default stride is 2 for both dimensions. Since the input matrix is in shape 256 x 256 x 16, the output matrix shape of MaxPool2D layer is 128 x 128 x 16. And due to the fact that the calculation method is already known to be maximum value, there is no parameter in this step.

The logic for the second round of convolution layer and pooling layer is the same.

Flatten Layer

Flatten is the flatten layer, it is to transform the input 3-dimensional output matrix into a single column. Since the input matrix has shape 64 x 64 x 32, the length of the column is 64 x 64 x 32 = 131072, and no parameter is applied in this process.

Fully Connected Layer

The final step is the fully connected layer Dense. The fully connected layer will combine all the features together and get the classification result based on certain activation functions.

error

Since all neurons are connected together, suppose the numbers of neurons of the input layer and output layer are $n_1$ and $n_2$ respectively, the number of parameters will be ($n_1$ x $n_2$ + $n_2$), where the second part is the intercept weights for each neurons. The number of neurons of the output layer is determined by the units parameter. Please note that the last fully connected layer in the classifier should have units equal to the number of categories of the classification problem. In this post, the units should be 3 for the last Dense layer.

The whole process of CNN model can be visualized as:

error

To avoid over-fitting, a possible solution is to add a Dropout layer in the CNN model. However, since we only have 1019 training samples in this case, such layer is not necessary.

Result

The final training and testing accuracy can be visualized with the number of epoches. As is shown from the figure, after about 6 epoches, the training and testing accuracy will both reach 100%. Therefore, this is an excellent model in both the training accuracy as well as the generalization ability. Convolutional Neural Network has demonstrated its power in image classification!

error

Reference

[1] https://medium.com/@RaghavPrabhu/understanding-of-convolutional-neural-network-cnn-deep-learning-99760835f148
[2] https://towardsdatascience.com/10-minutes-to-building-a-cnn-binary-image-classifier-in-tensorflow-4e216b2034aa
[3] https://heartbeat.fritz.ai/a-beginners-guide-to-convolutional-neural-networks-cnn-cf26c5ee17ed
[4] https://towardsdatascience.com/simple-introduction-to-convolutional-neural-networks-cdf8d3077bac
[5] https://stackoverflow.com/questions/55444120/understanding-the-output-shape-of-conv2d-layer-in-keras

Previous

Related