Home | Portfolio | Terms and Conditions | E-mail me | LinkedIn

Image Recognition with Keras: Convolutional Neural Networks

Image recognition and classification is a rapidly growing field in the area of machine learning. In particular, object recognition is a key feature of image classification, and the commercial implications of this are vast.

For instance, image classifiers will increasingly be used to:

These are just a few of many examples of how image classification will ultimately shape the future of the world we live in.

So, let’s take a look at an example of how we can build our own image classifier.

Model Training with VGG16

VGG16 is a built-in neural network in Keras that is pre-trained for image recognition.

Technically, it is possible to gather training and test data independently to build the classifier. However, this would necessitate at least 1,000 images, with 10,000 or greater being preferable.

In this regard, it is much easier to use a pre-trained neural network that has already been designed for image classification purposes.

Purpose of model

The following are two images of traffic in Madrid, Spain, and Stockholm, Sweden:





The purpose of building an image classifier in this instance is to correctly identify the presence of a vehicle in the image.

As an example, autonomous cars need to have the ability to detect the presence of traffic in real-time in order to avoid a collision. Image recognition (or instantaneously examining each frame in a video for the presence of an object) is how this would be accomplished.

Let’s start with the image of traffic in Madrid city centre.


Firstly, the libraries are imported and the predictions are generated.

# Import Libraries
from keras.preprocessing.image import load_img
from keras.preprocessing.image import img_to_array
from keras.applications.vgg16 import preprocess_input
from keras.applications.vgg16 import decode_predictions
from keras.applications.vgg16 import VGG16
from keras import backend as K
import matplotlib.pyplot as plt
import numpy as np

# Process Model
model = VGG16()
image = load_img('madrid.jpg', target_size=(224, 224))
image = img_to_array(image)
image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2]))
image = preprocess_input(image)

# Generate predictions
pred = model.predict(image)
print('Predicted:', decode_predictions(pred, top=3)[0])

Here are the generated predictions:

>>> print('Predicted:', decode_predictions(pred, top=3)[0])
Predicted: [('n03788195', 'mosque', 0.578081), ('n03220513', 'dome', 0.16524781), ('n03837869', 'obelisk', 0.08766182)]

We see that for this instance, the classifier did not do a particularly good job. It did not identify the presence of cars in the image, and incorrectly identified the train station as a mosque.


To diagnose this further, it is helpful to use what is called a heatmap. This allows us to determine where precisely the neural network is “zooming in” on the image to make a classification.

Firstly, a Grad-CAM algorithm can be used to generate the heatmap:

# Grad-CAM algorithm
specoutput=model.output[:, 668]
last_conv_layer = model.get_layer('block5_conv3')
grads = K.gradients(specoutput, last_conv_layer.output)[0]
pooled_grads = K.mean(grads, axis=(0, 1, 2))
iterate = K.function([model.input], [pooled_grads, last_conv_layer.output[0]])
pooled_grads_value, conv_layer_output_value = iterate([image])
for i in range(512):
    conv_layer_output_value[:, :, i] *= pooled_grads_value[i]

heatmap=np.mean(conv_layer_output_value, axis=-1)

# Heatmap post processing
heatmap = np.maximum(heatmap, 0)
heatmap /= np.max(heatmap)


Now, the heatmap is superimposed. In other words, we can view where the zones in the heatmap occur and where the classification is being centered.

# Superimposing heatmap
import cv2
img = cv2.imread('croppedstockholm4.jpg')
heatmap = cv2.resize(heatmap, (img.shape[1], img.shape[0]))
heatmap = np.uint8(255 * heatmap)
heatmap = cv2.applyColorMap(heatmap, cv2.COLORMAP_JET)
superimposed_img = heatmap * 0.4 + img
cv2.imwrite('heatmap4.jpg', superimposed_img)

Here is the superimposed image:


We can see that the heatmap is not focused on the cars in the image. Rather, the focus is on the large building and the water fountain as indicated by the zones in the image.

Cropping images with PIL

However, one way to alleviate this is by cropping the image into four separate quadrants using the PIL library.

The reason for this is that by considering certain areas of the image in isolation, the probability is increased that the image classifier will detect the presence of cars in the image as a whole.

To do this, a base height is set for the image which will be resized, and then the image is cropped into four equal quadrants.

from PIL import Image
baseheight = 700
img = Image.open('madrid.jpg')
hpercent = (baseheight / float(img.size[1]))
width = int((float(img.size[0]) * float(hpercent)))
img = img.resize((width, baseheight), Image.ANTIALIAS)


croppedIm = img.crop((0, 0, w1, h1)) # left, up, right, bottom

croppedIm = img.crop((0, h1, w1, baseheight)) # left, up, right, bottom

croppedIm = img.crop((w1, 0, width, h1)) # left, up, right, bottom

croppedIm = img.crop((w1, h1, width, baseheight)) # left, up, right, bottom

Here are the four cropped images:

Cropped Image 1


Cropped Image 2


Cropped Image 3


Cropped Image 4


>>> print('Predicted:', decode_predictions(pred, top=3)[0])
Predicted: [('n03770679', 'minivan', 0.12950829), ('n02930766', 'cab', 0.113266684), ('n04461696', 'tow_truck', 0.09845059)]

I decided to re-run cropped image 4 through the classifier (the one with the cars present).

We see that the classifier detects the presence of a vehicle in the image (probabilities appear for the terms ‘minivan’, ‘cab’, and ‘tow_truck’). While the probabilities themselves are quite low, the model has correctly identified the general category of the object in the image, i.e. a vehicle, and is therefore considered suitable for this purpose. For instance, an autonomous car needs to detect the presence of a vehicle on the road first and foremost, rather than necessarily classifying that vehicle as a cab or a minivan.

Here are the heatmaps for the cropped image:



Superimposed heatmap


We now see that the focus is on the car for the cropped image.

Analysis for Stockholm

The same VGG16 model was generated for Stockholm, and these were the results and heatmap:

>>> print('Predicted:', decode_predictions(pred, top=3)[0])
Predicted: [('n03877845', 'palace', 0.7787534), ('n04335435', 'streetcar', 0.151575), ('n04486054', 'triumphal_arch', 0.013562491)]


As we can see, the zone in the heatmap is focused around the building rather than the vehicles, with the classifier identifying the building in question as a ‘palace’.

Again, it was decided to crop the image into four separate quadrants:

Cropped Image 1


Cropped Image 2


Cropped Image 3


Cropped Image 4


Let’s take the example of image 4 – the image of the bus.

Upon running the classifier, the following results were obtained:

>>> print('Predicted:', decode_predictions(pred, top=3)[0])
Predicted: [('n04335435', 'streetcar', 0.59897834), ('n04487081', 'trolleybus', 0.072266325), ('n06874185', 'traffic_light', 0.065069936)]

The following predictions are generated: ‘streetcar’, ‘trolleybus’, and ‘traffic_light’. The classifier identifies that the vehicle is a larger one, and we can see that this is possible when we take the image of the bus in isolation.

Here is the heatmap for the cropped image:



Superimposed heatmap


Again, the focus is now on the bus in the heatmap for the cropped image.


In this example, you have seen: