Wait, who is Rajesh Dai?

Rajesh Hamal to Nepal and Nepalese is what Chuck Norris is to Hollywood and Rajanikath is to India / Bollywood. There are countless jokes — some original and most translated/adapted about his “powers”. He seems to enjoy those jokes himself, and sometimes, he recites them. The first megastar of Nepalese movie industries, he has that cult following that no other Nepalese actors enjoy. People across three generations call his “Rajesh Dai” (Dai = Brother) with love. While I admit I have not watched his movies in the last decade, this one is for the childhood memory.

Inspired by Jian Yang’s Hot Dog detector app as seen in the TV series Silicon Valley, (It’s a great series, check it out!)In this project, I am building a binary image classifier that detects if an image is Rajesh Hamal or Not. I used Rajesh Hamal here, but you can use this on any image class as long as you have enough dataset. Let’s get started.

Dataset

I created this dataset, with help of Ritu by manually collecting 100 low-resolution images of Rajesh Hamal and 100 other random Nepalese actors. Images have different sizes and resolutions as previewed in google image search, and they all are in .jpg format. This dataset can now be downloaded from Kaggle here.

Imports / Libraries

We will use PIL for image manipulation, and Tensorflow with Keras for building neural networks. SKLearn and Seaborn will aid with plots. Other imports like os and random for common python tasks.

Data Import and Manipulation

Next, we will get the data (via Kaggle download or get, etc.) This step can be different depending upon if you are running this on Kaggle / Google Collab or a local machine, but the end state is that you have two folders ‘rajeshdai’ and ‘others’ within the ‘images’ folder.

If you have two zip in the root folder, the following code snippet will unzip the dataset into appropriate folders and clean up the mess.

Next, we will read the data and display a sample of each class.

Next, we need to write a function to pre-process the data. Also, since we have only 100 images in each class, we will generate more by slightly modifying them:

Flipping (vertical)
Mirroring (Horizontal)
Slightly Rotating (randomly rotating by angle -20 < angle < 20 degrees)
Resizing to m x n (Here, we will just resize to 256 * 256 )size. This will create distorted images — stretched or squeezed to unnatural looks, but the goal is to make it work despite all that.

Next step, we get the expanded dataset using the function above and we append labels to them. we also shuffle data and do a training-test split using sklearn’s library.

Now, we have the training and test data ready to go!

Building a Neural Networks Model

Next, we proceed to make a simple neural network model. The model we created has three convolutional layers and two fully connected (dense) layers. There are a bunch of dropout layers to prevent overfitting and Pooling layers to discard lesser important information.

This might look big and complicated, but that is because half of it is just flattening/pooling/dropout layers, so the model itself is not very big. We can visualize the model as follows:

Next, we train the model. The training itself took less than a minute in Google Colab using GPUs. If trained in CPU only machine, it will probably take a few minutes but nothing too crazy.

Result

Let’s see the training and test performance. The metrics are produced by code snippets below:

With the validation split of 0.2, we were able to get the validation accuracy of 93% when early stopping kicked in at 12 epochs.

Plotting the accuracy:

Next, we check the accuracy with test data: a never-seen-before data sample

With the test data, the accuracy is 92.22 %

Let’s have a closer look at the test result:

We can see that the model has over 90% of success. The model has slightly more False Negative ( Predicting ‘others’ for Rajesh Hamal’s pictures 5.6% of the time) than False Positive (Predicting ‘Rajesh Hamal’’ for others’ pictures 2.2% of the time).

Overall though, without using other pre-trained models, an accuracy of over 92% is a great result, especially considering we started with very small sample size.

Conclusion

In this project, we explored using a balanced but relatively small dataset for image classification. By applying the expansion technique by generating more training data by tweaking the originals, we were able to train a convolutional neural network (CNN) using TensorFlow and Keras. With under one minute of training time*, we achieved over 90% accuracy on binary image classification.

Next time, I plan to try transfer learning using a pre-trained model to start with.

The full code (as well as the dataset) can be found on my Github here.

RajeshDai Detector: Creating a Simple Binary Image Classifier with Neural Networks Title

A product of Inspiration by Jian-Yang’s Hot Dog /Not Hot Dog detector