Hire the author: Merishna S

Manually monitoring people entering such institutions is tedious and requires a workforce. In short, in this tutorial, we will learn how to build a face mask classifier using deep learning to automatically detect people not wearing masks to prevent their entry into cafeterias, restaurants, schools, etc.

Introduction and Motivation

The CDC continues to monitor the spread of COVID-19 and advises people who are completely vaccinated as well as those who are not fully vaccinated to wear face masks. When visiting the doctor’s office, hospitals, or long-term care institutions, the CDC recommends wearing masks and keeping a safe distance.

This tutorial will walk you through the process of developing a deep learning model to automatically detect a person’s face through an image supplied from a file path. We will then use our trained model to detect if the person is violating public rules by not wearing a mask.​


​Deep Learning: It is a kind of machine learning technique that enables learning through the use of neural networks that mimic the human brain.


  1. Programming knowledge in Python.
  2. Basic knowledge of Jupyter Notebook, Deep Learning, Keras.

Creating the deep learning face mask classifier model

Firstly, we will start building a Deep Learning model to predict (detect) if a person is violating the rules by not wearing a mask in public spaces. Also, all the steps that we are going to discuss hereafter are available in a notebook here.

Step 1: Installing and Importing the necessary Python libraries

Clone this repository and install the libraries by using the command:
pip install -r requirements.txt

Note: If your Python version is >=3.9, there is no stable version of the library tensorflow. To install tensorflow use the command:
pip install tf-nightly

Now, we will import the necessary libraries in Python.

import numpy as np # linear algebra
import cv2 # opencv
import matplotlib.pyplot as plt # image plotting
# keras
from keras import Sequential
from keras.layers import Flatten, Dense
from keras.applications.vgg19 import VGG19
from keras.applications.vgg19 import preprocess_input
from keras.preprocessing.image import ImageDataGenerator

Step 2: Getting the training data

​For the training data, we are using the face mask detection data from here. The dataset contains 12 thousand images divided into Test, Train, and Validation sets which were scraped from Google and the CelebFace dataset created by Jessica Li. To start using it, you can download the dataset and save it in the working directory.

# Load train and test set
train_dir = "data/Face Mask Dataset/Train"
test_dir = "data/Face Mask Dataset/Test"
val_dir = "data/Face Mask Dataset/Validation"
Sample training data with mask

We can see that our training data has people with different kinds and patterns of masks including all worst-case scenarios. Similarly, the training data without masks also contain a varying number of facial images in different lighting, with and without beards which will help us handle different scenarios when determining faces with no masks.

Sample training data without mask

Step 3: Reading a sample image and performing face detection

Consequently, we will now read in a sample image of public space and perform face detection using a haar cascade classifier.

The Haar cascade classifier, originally known as the Viola-Jones Face Detection Technique is an object detection algorithm for detecting faces in images or real-time video. Viola and Jones proposed edge or line detection features in their research paper “Rapid Object Detection using a Boosted Cascade of Simple Features,” published in 2001. The algorithm is given a large number of positive photos with faces and a large number of negative images with no faces. The model developed as a result of this training can be found in the OpenCV GitHub repository.[1]

# Read a sample image
img = cv2.imread("sample_images/image (1).png")
# Keep a copy of coloured image
orig_img = cv2.cvtColor(img, cv2.COLOR_RGB2BGR) # colored output image
# Convert image to grayscale
img = cv2.cvtColor(img, cv2.IMREAD_GRAYSCALE)
# loading haarcascade_frontalface_default.xml
face_detection_model = cv2.CascadeClassifier("data/haarcascade_frontalface_default.xml")
# detect faces in the given image
return_faces = face_detection_model.detectMultiScale(
img, scaleFactor=1.07, minNeighbors=4
) # returns a list of (x,y,w,h) tuples
# plotting the returned values
for (x, y, w, h) in return_faces:
cv2.rectangle(orig_img, (x, y), (x + w, y + h), (0, 0, 255), 1)
plt.figure(figsize=(12, 12))
plt.imshow(orig_img) # display the image
Face detection using haar cascades

Note: If you want to try using a different image, change the image path in the above code as follows:

img = cv2.imread("<path to new image file>")

Step 4: Data preprocessing for building the face mask classifier in Keras

Keras is a deep learning API written in Python, running on top of the machine learning platform TensorFlow. It was developed with a focus on enabling fast experimentation. Being able to go from idea to result as fast as possible is key to doing good research.[2]

We will now pass our datasets into Keras ImageDataGenerator() to perform some preliminary data augmentation steps such as rescaling.

# Data preprocessing
# Train data
train_datagenerator = ImageDataGenerator(
rescale=1.0 / 255, horizontal_flip=True, zoom_range=0.2, shear_range=0.2
train_generator = train_datagenerator.flow_from_directory(
directory=train_dir, target_size=(128, 128), class_mode="categorical", batch_size=32
# Validation data
val_generator = val_datagenerator.flow_from_directory(
directory=val_dir, target_size=(128, 128), class_mode="categorical", batch_size=32
# Test data
test_generator = test_datagenerator.flow_from_directory(
directory=val_dir, target_size=(128, 128), class_mode="categorical", batch_size=32

Step 5: Create the face mask classifier transfer learning model using Keras

We are building the deep learning classifier using the VGG19 transfer learning model. The VGG19 model is the successor of AlexNet, a variation of the VGG model named after the group named as Visual Geometry Group at Oxford which created it. In addition, it is a deep CNN consisting of 19 layers (16 convolution layers, 3 Fully connected layers, 5 MaxPool layers, and 1 SoftMax layer) used to classify images.[4]

Additionally, it has been trained on ImageNet, a picture database with 14,197,122 images structured according to the WordNet hierarchy.

Architecture of VGG19 CNN network (Source)
# Initializing the VGG19 model
vgg19_model = VGG19(weights="imagenet", include_top=False, input_shape=(128, 128, 3))
for layer in vgg19_model.layers:
layer.trainable = False
# Initialize a sequential model
model = Sequential()
model.add(Dense(2, activation="sigmoid"))
# Compiling the model
model.compile(optimizer="adam", loss="categorical_crossentropy", metrics="accuracy")
Model summary

Step 6: Train the model

We will now train our neural network model for 20 epochs. Moreover, we will also use the validation dataset for improving the model performance while training.

# Fit the model on train data along with validation data
model_history = model.fit_generator(
steps_per_epoch=len(train_generator) // 32,
validation_steps=len(val_generator) // 32,
Epoch 1/20
9/9 [==============================] - 31s 3s/step - loss: 0.6622 - accuracy: 0.6632
Epoch 2/20
9/9 [==============================] - 30s 3s/step - loss: 0.2757 - accuracy: 0.9028
Epoch 3/20
9/9 [==============================] - 31s 3s/step - loss: 0.1645 - accuracy: 0.9444
Epoch 4/20
9/9 [==============================] - 30s 3s/step - loss: 0.1694 - accuracy: 0.9410
Epoch 5/20
9/9 [==============================] - 30s 3s/step - loss: 0.0984 - accuracy: 0.9669
Epoch 6/20
9/9 [==============================] - 31s 3s/step - loss: 0.1003 - accuracy: 0.9688
Epoch 7/20
9/9 [==============================] - 32s 3s/step - loss: 0.1194 - accuracy: 0.9444
Epoch 8/20
9/9 [==============================] - 30s 3s/step - loss: 0.0736 - accuracy: 0.9792
Epoch 9/20
9/9 [==============================] - 31s 3s/step - loss: 0.0519 - accuracy: 0.9965
Epoch 10/20
9/9 [==============================] - 31s 3s/step - loss: 0.0663 - accuracy: 0.9722
Epoch 11/20
9/9 [==============================] - 33s 4s/step - loss: 0.0799 - accuracy: 0.9653
Epoch 12/20
9/9 [==============================] - 30s 3s/step - loss: 0.0680 - accuracy: 0.9688
Epoch 13/20
9/9 [==============================] - 29s 3s/step - loss: 0.0727 - accuracy: 0.9792
Epoch 14/20
9/9 [==============================] - 28s 3s/step - loss: 0.0647 - accuracy: 0.9757
Epoch 15/20
9/9 [==============================] - 31s 3s/step - loss: 0.0680 - accuracy: 0.9826
Epoch 16/20
9/9 [==============================] - 29s 3s/step - loss: 0.0875 - accuracy: 0.9669
Epoch 17/20
9/9 [==============================] - 30s 3s/step - loss: 0.0500 - accuracy: 0.9931
Epoch 18/20
9/9 [==============================] - 30s 3s/step - loss: 0.0553 - accuracy: 0.9861
Epoch 19/20
9/9 [==============================] - 30s 3s/step - loss: 0.0504 - accuracy: 0.9792
Epoch 20/20
9/9 [==============================] - 30s 3s/step - loss: 0.0484 - accuracy: 0.9861

Step 7: Evaluate the model performance on test set

# Evaluate model performance on test data
model_loss, model_acc = model.evaluate(test_generator)
print("Model has a loss of %.2f and accuracy %.2f%%" % (model_loss, model_acc*100))
25/25 [==============================] - 78s 3s/step - loss: 0.0544 - accuracy: 0.9825
Model has a loss of 0.05 and accuracy 98.25%

Step 8: Save the trained model

We can also choose to save the trained model as an h5 file for future use.


Step 9: Test the face mask classifier model on the sample image

Lastly, we will test the trained model on our use case for detecting faces and masks for a group of people. We take the detected face crops of the faces detected in the image and then predict the mask or no mask using the model trained.

# label for mask detection
mask_det_label = {0: "Mask", 1: "No Mask"}
mask_det_label_colour = {0: (0, 255, 0), 1: (255, 0, 0)}
pad_y = 1 # padding for result text
main_img = cv2.cvtColor(img, cv2.COLOR_RGB2BGR) # colored output image
# For detected faces in the image
for i in range(len(return_faces)):
(x, y, w, h) = return_faces[i]
cropped_face = main_img[y : y + h, x : x + w]
cropped_face = cv2.resize(cropped_face, (128, 128))
cropped_face = np.reshape(cropped_face, [1, 128, 128, 3]) / 255.0
mask_result = model.predict(cropped_face) # make model prediction
print_label = mask_det_label[mask_result.argmax()] # get mask/no mask based on prediction
label_colour = mask_det_label_colour[mask_result.argmax()] # green for mask, red for no mask
# Print result
(t_w, t_h), _ = cv2.getTextSize(
print_label, cv2.FONT_HERSHEY_SIMPLEX, 0.4, 1
) # getting the text size
(x, y + pad_y),
(x + t_w, y t_h pad_y 6),
) # draw rectangle
(x, y 6),
(255, 255, 255), # white
) # print text
(x, y),
(x + w, y + h),
) # draw bounding box on face
plt.figure(figsize=(10, 10))
plt.imshow(main_img) # display image

In conclusion, the results for some sample images from the model we trained are as below.

Learning References

  1. HaarCascade Github repository
  2. Keras Github repository
  3. Object Detection using HaarCascade
  4. Understand the VGG19 Architecture
  5. Sample Images Dataset from Kaggle

Learning Strategies

  1. Deep Learning is a type of machine learning algorithm that uses neural networks for performing its predictions. The key to learning about neural networks effectively is to learn and visualize the whole architecture of the system. By doing this, we can easily understand how the data is being processed step-by-step.
  2. Also, it is a good practice to print or log important messages and errors to help with debugging.
  3. For instance, neural networks are very sensitive to hyperparameters. Therefore, it’s very important to tune them precisely to increase the model’s accuracy and improve its performance.

Reflective Analysis

Working on this project was challenging mainly in terms of taking into account more than one face in images. The detection accuracy also depends on a variety of factors such as lighting, time of the day, and the orientation of the person’s face in front of the camera. Thankfully, the training dataset I used had most of these conditions which made it easier for the model to be trained for worst-case scenarios.

Additionally, VGG19 is a CNN architecture that can be easily modified according to the need of the problem, which makes them so versatile to use and create.

Conclusions and Future Directions

In conclusion, the results generated from the model were satisfactory to an extent. We can improve the model performance by training the model for larger epochs and more training images. Additionally, we can also extend this use case for live surveillance cameras (CCTVs) to detect people with no masks and social distancing in real-time.

Also, the code for this project is available on GitHub and Kaggle.

Additionally, if you’re looking to do similar innovative projects in Deep Learning, you might be interested in this project on How to generate unique architectures using GANs.

Hire the author: Merishna S

Artificial intelligence & Machine Learning engineer. Has strong fundamentals in machine learning algorithms (neural networks, dimensionality reduction, feature utilization, and extraction and clustering), programming, statistics, and mathematics.