Dog Breed Classification: Part 2 (Deep Learning Model)

Shrinand Kadekodi
5 min readAug 6, 2022

This is in continuation of the last post where I scraped images from Google Search. You can read it here:
https://shrinandkadekodi.medium.com/dog-breed-classification-part-1-image-scraping-56622e97dd36
In this part we will see the training of the model using Neural Networks with images as input. Let’s start !!!

Google Colab

For training the model, I have utilised the Google Colab environment because training Neural Networks for image processing is memory intensive. Google Colab provides free GPU access for faster processing making it ideal for developers like me.
The training data was uploaded on the drive manually. Once uploaded, I created a new Colab environment and connected my Google drive for accessing the image data (Colab uses Tensorflow for developing the Deep Learning models).

Convolution Neural Network

The best method for Image classification in Deep Learning is Convolution Neural Network (CNN). I will skip the CNN explanation part as honestly there are lot of resources available which explain them well. Some of them are below:
https://machinelearningmastery.com/convolutional-layers-for-deep-learning-neural-networks/
https://youtu.be/FmpDIaiMIeA

Implementation

The first step is to read the images as per the different dog breeds. For this there is an inbuilt function in Tensorflow in utils called image_dataset_from_directory which reads all the images and assigns them to each class. To use this function, the folder structure should have a fixed structure. This has been taken care in our scraping exercise where train folder has folders as per the class (i.e. dog breed name).
We can directly use this function as below:

import tensorflow as tf
import numpy as np
train_gen = tf.keras.utils.image_dataset_from_directory(directory = train_dir,
image_size = (224,224),
label_mode='categorical')
test_gen = tf.keras.utils.image_dataset_from_directory(directory = test_dir,
image_size = (224,224),
label_mode='categorical')
classNames = train_gen.class_names

Since this is a classification problem and we have multiple classes the label_mode is chosen as categorical. Also the image has been resized to (224,224). The reason for using this size will be evident afterwards. But for now this is the size that the image has been rescaled to.
After importing the train and test data, the next step is to create the Deep Learning model. Its really simple to create a Deep Learning model in Tensorflow. It could be complex if we were to utilize custom layers or create our own model. But in this exercise, I have used the inbuilt functions provided by Tensorflow. So using Sequential Model we can build the CNN model to detect the dog breed. The code is as below:

# simple CNN network model = tf.keras.models.Sequential([      tf.keras.layers.Conv2D(filters=64,kernel_size=3,input_shape=(224,224,3),activation='relu'),      tf.keras.layers.MaxPool2D(pool_size=(2,2)),      tf.keras.layers.Conv2D(filters=64,kernel_size=3,activation='relu'),      tf.keras.layers.MaxPool2D(pool_size=(2,2)),      tf.keras.layers.Conv2D(filters=64,kernel_size=3,activation='relu'),      tf.keras.layers.MaxPool2D(pool_size=(2,2)),        tf.keras.layers.Dropout(.2),           tf.keras.layers.Flatten(),               tf.keras.layers.Dense(128,activation='relu'),           tf.keras.layers.BatchNormalization(),      tf.keras.layers.Dense(128,activation='relu'),           tf.keras.layers.Dense(4,activation='softmax')     ])  
model.compile(loss=tf.keras.losses.CategoricalCrossentropy(),optimizer=tf.keras.optimizers.Adam(),metrics='accuracy') history = model.fit(train_gen,epochs=20,steps_per_epoch=len(train_gen),validation_data=test_gen,validation_steps=len(test_gen))

One glance at the code and you can see that each layer receives its input from preceding layer. Also there are six unique different layers used in the sequence.
A short explanation of each layer and its functionality is given below:
a. The CNN layer as name suggests is the one where we reduce the image by applying different filters and use relu activation.
b. The MaxPool2D layer again reduces the image by taking the max values from the pool of 2×2 array. c. The layers Dropout and BatchNormalization help in reducing the overfitting of the model and making it more generalised.
d. The Flatten layer is used to convert the reduced image in one dimension.
e. Lastly the Dense layer is just to make the model more accurate or learn more features. Also note that the last Dense layer will always have the number of outputs equal to the number of classes and uses softmax activation rather than relu.
The last step is to compile and fit the model on the data. The loss used is CategoricalCrossentropy as there are multiple classes to identify. The optimizer selected is Adam ( but can be tried with others like SGD) and metrics to determine is the accuracy. Fitting the model on the train data, I ran for 20 epochs and then saved the model parameters in history variable.

The accuracy barely touched 70%. I tried with different combinations but I was not successful. You can see the model loss and accuracy graph below:

It is not good I agree. But I could not go ahead with this. I at least needed 75% accuracy for the model (not that it is the best metric as that would be F-score for classification).
The next step was well, as you may have guessed, go with Transfer Learning.

Transfer Learning

The basic premise of Transfer Learning is to take an already available model and use that as a starting point for training on our data. Note that the tasks should be related to each other eg: Knowledge gained while learning for dogs could apply when trying to recognize wolfs or knowledge gained for car could be used to recognize trucks and so on.
One more thing to keep in mind is that the Transfer model that we use does have the basic features trained but the finer features need to be trained. For this we can train the upper layers on the training data.
The same thing has been done below. I have selected the EfficientNet as the transfer learning model because … well its efficient 😅 (and Tensorflow comparison graph shows it is the best).

#import tensorflow_hub as hub
from tensorflow.keras import layers,models
from tensorflow.keras.applications import EfficientNetB0
from tensorflow.keras.models import Sequential

def createModel(classesNum = 10):

model = models.Sequential()
model.add(EfficientNetB0(include_top=False, weights="imagenet",input_shape=(224, 224, 3)))
model.add(tf.keras.layers.GlobalAveragePooling2D())
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Dropout(0.02))
model.add(tf.keras.layers.Flatten())
model.add(layers.Dense(classesNum, activation="softmax"))

return model

def unfreeze_model(model):
# We unfreeze the top 20 layers while leaving BatchNorm layers frozen
for layer in model.layers[-20:]:
if not isinstance(layer, layers.BatchNormalization):
layer.trainable = True

# callback to save model
callbackSave = tf.keras.callbacks.ModelCheckpoint(filepath='/content/drive/MyDrive/Dog_Detection/',save_weights_only=True,save_freq='epoch',verbose=1)
efficientnetModel = createModel(len(train_gen.class_names))
efficientnetModel.compile(loss=tf.keras.losses.CategoricalCrossentropy(),optimizer=tf.keras.optimizers.Adam(),metrics='accuracy')
unfreeze_model(efficientnetModel)
historyEfficient = efficientnetModel.fit(train_gen,epochs=20,steps_per_epoch=len(train_gen),validation_data=test_gen,validation_steps=len(test_gen),callbacks=callbackSave)

There are different variations of EfficientNet which use differently scaled images as input. The one which I have taken requires input image scaled to (224,224). Hence when importing the images I had scaled them to the required value. You can check the different version here.
After training the last 20 layers, the accuracy was above 80%. The callback ModelCheckpoint saves the newly trained EfficientNet in the drive. I tested this on a small number of images and the accuracy was well not bad … better than the earlier model.

In this exercise I created a simple model to detect the dog breeds using Tensorflow and Transfer Learning. In the next part I will show how to build a small application which will take images as input and using this model show the output result. I hope this post gave a small glimpse of how to train and save models using Tensorflow.

References:
https://www.dlology.com/blog/transfer-learning-with-efficientnet/
https://keras.io/examples/vision/image_classification_efficientnet_fine_tuning/
Along with this lots of googling, stackoverflow

Originally published at http://evrythngunder3d.wordpress.com on August 6, 2022.

--

--