Author Archives: kang & atul

Adaptive Thresholding

In the previous blog, we discussed how global thresholding can be a tedious task when dealing with images having non-uniform illumination. This is because you need to ensure that while subdividing an image, each sub-image histogram is bimodal. Otherwise, the segmentation task will fail.

In this blog, we will discuss adaptive thresholding that works well for varying conditions like non-uniform illumination, etc. In this, the threshold value is calculated separately for each pixel using some statistics obtained from its neighborhood. This way we will get different thresholds for different image regions and thus tackles the problem of varying illumination.

The whole procedure can be summed up as:

For each pixel in the image
- Calculate the statistics (such as mean, median, etc.) from its neighborhood. This will be the threshold value for that pixel.
- Compare the pixel value with this threshold

Now, let’s discuss the OpenCV function for adaptive thresholding.

cv2.adaptiveThreshold(src, maxValue, adaptiveMethod, thresholdType, blockSize, C)

1	cv2.adaptiveThreshold(src, maxValue, adaptiveMethod, thresholdType, blockSize, C)

src: 8-bit greyscale image
thresholdType: This tells us what value to assign to pixels greater/less than the threshold. Must be either THRESH_BINARY or THRESH_BINARY_INV. (You can read more about it here).
maxValue: This is the value assigned to the pixels after thresholding. This depends on the thresholding type. If the type is cv2.THRESH_BINARY, all the pixels greater than the threshold are assigned this maxValue.
adaptiveMethod: This tells us how the threshold is calculated from the pixel neighborhood. This currently supports two methods:
- cv2.ADAPTIVE_THRESH_MEAN_C: In this, the threshold value is the mean of the neighborhood area.
- cv2.ADAPTIVE_THRESH_GAUSSIAN_C: In this, the threshold value is the weighted sum of the neighborhood area. This uses Gaussian weights computed using getGaussiankernel() method. You can read more about it here.
blockSize: This is the neighborhood size.
C: a constant which is subtracted from the threshold.

As discussed OpenCV only provides mean and weighted mean to serve as the threshold. But don’t limit yourself to these two statistics. Try other statistics like standard deviation, median, etc. by writing your own helper function. Let’s see how to use this.

import cv2
# Load the image
img1 = cv2.imread("D:/downloads/adap1.jpg",0)
# Apply Otsu method
ret, thres = cv2.threshold(img2,127,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)
# Apply adaptive threshold
th3 = cv2.adaptiveThreshold(img2,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C,cv2.THRESH_BINARY,5,2)
# Display the result
cv2.imshow('original',img1)
cv2.imshow('otsu', thres)
cv2.imshow('adaptive', th3)
cv2.waitKey(0)

import cv2

# Load the image

img1 = cv2.imread("D:/downloads/adap1.jpg",0)

# Apply Otsu method

ret, thres = cv2.threshold(img2,127,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)

# Apply adaptive threshold

th3 = cv2.adaptiveThreshold(img2,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C,cv2.THRESH_BINARY,5,2)

# Display the result

cv2.imshow('original',img1)

cv2.imshow('otsu', thres)

cv2.imshow('adaptive', th3)

cv2.waitKey(0)

See how effective adaptive thresholding is in the case of non-uniform illumination. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Balanced histogram thresholding

Leave a reply

In the previous blogs, we discussed different methods for automatically finding the global threshold for an image. For instance, the iterative method, Otsu’s method, etc. In this blog, we will discuss another very simple approach for automatic thresholding – Balanced histogram thresholding. As clear from the name, this method tries to automatically find the threshold by balancing the image histogram. Let’s understand this method in detail.

Note: This method assumes that the image histogram is bimodal and a reasonable contrast ratio exists between the background and the region of interest.

Concept

Suppose you have a perfectly balanced histogram i.e. a histogram where the distribution of the background and the roi is the same. If you place such a histogram over the lever, it will be balanced. And the optimum threshold will be at the center of the lever as shown in the figure below

This is the main idea behind the Balanced Histogram Thresholding. This method tries to balance the image histogram and then infer the threshold value from that.

But in real-life situations, we don’t encounter images with such perfectly balanced histograms. So, let’s see how this method balances the unbalanced histograms.

First, it places the histogram over the lever and calculates the center point.
Then this calculates the left side and right side weights from the center point.
Removes weight from the heavier side and adjust the center.
Repeat the above two steps until the starting and the endpoints are equal to the center.

The whole procedure can be summed up in the below gif (taken from Wikipedia)

**Credits: By Power3d – Own work, CC BY-SA 3.0**

Below is the python code for this. Here, i_s, i_e are the starting and the endpoints of the histogram and i_m is the center

def balanced_hist_thresholding(b):
    # Starting point of histogram
    i_s = np.min(np.where(b[0]>0))
    # End point of histogram
    i_e = np.max(np.where(b[0]>0))
    # Center of histogram
    i_m = (i_s + i_e)//2
    # Left side weight
    w_l = np.sum(b[0][0:i_m+1])
    # Right side weight
    w_r = np.sum(b[0][i_m+1:i_e+1])
    # Until starting point not equal to endpoint
    while (i_s != i_e):
        # If right side is heavier
        if (w_r > w_l):
            # Remove the end weight
            w_r -= b[0][i_e]
            i_e -= 1
            # Adjust the center position and recompute the weights
            if ((i_s+i_e)//2) < i_m:
                w_l -= b[0][i_m]
                w_r += b[0][i_m]
                i_m -= 1
        else:
            # If left side is heavier, remove the starting weight
            w_l -= b[0][i_s]
            i_s += 1
            # Adjust the center position and recompute the weights
            if ((i_s+i_e)//2) >= i_m:
                w_l += b[0][i_m+1]
                w_r -= b[0][i_m+1]
                i_m += 1
    return i_m

def balanced_hist_thresholding(b):

# Starting point of histogram

i_s = np.min(np.where(b[0]>0))

# End point of histogram

i_e = np.max(np.where(b[0]>0))

# Center of histogram

i_m = (i_s + i_e)//2

# Left side weight

w_l = np.sum(b[0][0:i_m+1])

# Right side weight

w_r = np.sum(b[0][i_m+1:i_e+1])

# Until starting point not equal to endpoint

while (i_s != i_e):

# If right side is heavier

if (w_r > w_l):

# Remove the end weight

w_r -= b[0][i_e]

i_e -= 1

# Adjust the center position and recompute the weights

if ((i_s+i_e)//2) < i_m:

w_l -= b[0][i_m]

w_r += b[0][i_m]

i_m -= 1

else:

# If left side is heavier, remove the starting weight

w_l -= b[0][i_s]

i_s += 1

# Adjust the center position and recompute the weights

if ((i_s+i_e)//2) >= i_m:

w_l += b[0][i_m+1]

w_r -= b[0][i_m+1]

i_m += 1

return i_m

The above function takes the image histogram as the input and returns the optimum threshold. Let’s take an example to check how this works.

import cv2
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
np.random.seed(7)

# Create a sample image
img = np.random.normal(40,10,size=(500,500)).astype('uint8')
img[img>100]=40
img[100:400,100:400] = np.random.normal(150,20,size=(300,300)).astype('uint8')

# Plot the histogram
b1 = plt.hist(img.ravel(),256,[0,256])
plt.show()

import cv2

import numpy as np

import matplotlib.pyplot as plt

%matplotlib inline

np.random.seed(7)

# Create a sample image

img = np.random.normal(40,10,size=(500,500)).astype('uint8')

img[img>100]=40

img[100:400,100:400] = np.random.normal(150,20,size=(300,300)).astype('uint8')

# Plot the histogram

b1 = plt.hist(img.ravel(),256,[0,256])

plt.show()

Below is the histogram of the image constructed.

Now, let’s apply the Balanced Histogram thresholding method to check what threshold value this outputs.

thresh_value = balanced_hist_thresholding(b1)
>>> 87

1 2	thresh_value = balanced_hist_thresholding(b1) >>> 87

87 looks like a reasonable threshold, check the image histogram above. So, that’s all for this time. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Optimum Global Thresholding using Otsu’s Method

1 Reply

In the previous blog, we discussed global thresholding and how to find the global threshold using the iterative approach. In this blog, we will discuss Otsu’s method, named after Nobuyuki Otsu, that automatically finds the global threshold. So, let’s discuss this method in detail.

Note: This method assumes that the image histogram is bimodal and a reasonable contrast ratio exists between the background and the region of interest.

In simple terms, Otsu’s method tries to find a threshold value which minimizes the weighted within-class variance. Since Variance is the spread of the distribution about the mean. Thus, minimizing the within-class variance will tend to make the classes compact.

Let’s say we threshold a histogram at a value “t”. This produces two regions – left and right of “t” whose variance is given by σ²₀ and σ²₁. Then the weighted within-class variance is given by

where w₀(t) and w₁(t) are the weights given to each class. Weights are total pixels in a thresholded region (left or right) divided by the total image pixels. Let’s take a simple example to understand how to calculate these.

Suppose we have the following histogram and we want to find the weighted within-class variance corresponding to threshold value 1.

Below are the weights and the variances calculated for left and the right regions obtained after thresholding at value 1.

Similarly, we will iterate over all the possible threshold values, calculate the weighted within-class variance for each of the thresholds. The optimum threshold will be the one with the minimum within-class variance.

Now, let’s see how to do this using python.

import cv2
import matplotlib.pyplot as plt
import numpy as np

# Create a sample image
np.random.seed(7)
img = np.random.normal(40,10,size=(500,500)).astype('uint8')
img[img>100]=40
img[100:400,100:400] = np.random.normal(150,20,size=(300,300)).astype('uint8')

# plot the histogram
hist = plt.hist(img.ravel(),256,[0,256])
plt.show()

import cv2

import matplotlib.pyplot as plt

import numpy as np

# Create a sample image

np.random.seed(7)

img = np.random.normal(40,10,size=(500,500)).astype('uint8')

img[img>100]=40

img[100:400,100:400] = np.random.normal(150,20,size=(300,300)).astype('uint8')

# plot the histogram

hist = plt.hist(img.ravel(),256,[0,256])

plt.show()

The image histogram is shown below

Now, let’s calculate the within-class variance using the steps which we discussed earlier.

# Set minimum value to infinity
final_min = np.inf
# total pixels in an image
total = np.sum(hist[0])
for i in range(256):
    # Split regions based on threshold
    left, right = np.hsplit(hist[0],[i])
    # Splt intensity values based on threshold
    left_bins, right_bins = np.hsplit(hist[1],[i])
    # Only perform thresholding if neither side empty
    if np.sum(left) !=0 and np.sum(right) !=0:
        # Calculate weights on left and right sides
        w_0 = np.sum(left)/total
        w_1 = np.sum(right)/total
        # Calculate the mean for both sides
        mean_0 = np.dot(left,left_bins)/np.sum(left)
        mean_1 = np.dot(right,right_bins[:-1])/np.sum(right)  # right_bins[:-1] because matplotlib has uses 1 bin extra
        # Calculate variance of both sides
        var_0 = np.dot(((left_bins-mean_0)**2),left)/np.sum(left)
        var_1 = np.dot(((right_bins[:-1]-mean_1)**2),right)/np.sum(right)
        # Calculate final within class variance
        final = w_0*var_0 + w_1*var_1
        # if variance minimum, update it
        if final<final_min:
            final_min = final
            thresh = i
        
print(thresh) # 95

# Set minimum value to infinity

final_min = np.inf

# total pixels in an image

total = np.sum(hist[0])

for i in range(256):

# Split regions based on threshold

left, right = np.hsplit(hist[0],[i])

# Splt intensity values based on threshold

left_bins, right_bins = np.hsplit(hist[1],[i])

# Only perform thresholding if neither side empty

if np.sum(left) !=0 and np.sum(right) !=0:

# Calculate weights on left and right sides

w_0 = np.sum(left)/total

w_1 = np.sum(right)/total

# Calculate the mean for both sides

mean_0 = np.dot(left,left_bins)/np.sum(left)

mean_1 = np.dot(right,right_bins[:-1])/np.sum(right) # right_bins[:-1] because matplotlib has uses 1 bin extra

# Calculate variance of both sides

var_0 = np.dot(((left_bins-mean_0)**2),left)/np.sum(left)

var_1 = np.dot(((right_bins[:-1]-mean_1)**2),right)/np.sum(right)

# Calculate final within class variance

final = w_0*var_0 + w_1*var_1

# if variance minimum, update it

if final<final_min:

final_min = final

thresh = i

print(thresh) # 95

The gif below shows how the within-class variance (blue dots) varies with the threshold value for the above histogram. The optimum threshold value is the one where the within-class variance is minimum.

OpenCV also provides a builtin function to calculate the threshold using this method.

OpenCV

You just need to pass an extra flag, cv2.THRESH_OTSU in the cv2.threshold() function which we discussed in the previous blog. The optimum threshold value will be returned by this along with the thresholded image. Let’s see how to use this.

gray = cv2.imread('kang.jpg',0)
retval, thresh = cv2.threshold(gray,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)

1 2	gray = cv2.imread('kang.jpg',0) retval, thresh = cv2.threshold(gray,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)

A Faster Approach

We all know that minimizing within-class variance is equivalent to maximizing between-class variance. This maximization operation can be implemented recursively and is faster than the earlier method. The expression for between-class variance is given by

Below are the steps to calculate recursively between-class variance.

Calculate the histogram of the image.
Set up weights and means corresponding to the “0” threshold value.
Loop through all the threshold values
1. Update the weights and the mean
2. Calculate the between-class variance
The optimum threshold will be the one with the max variance.

Below is the code in Python that implements the above steps.

# Calculate the histogram
hist = plt.hist(img1.ravel(),256,[0,256])
# Total pixels in the image
total = np.sum(hist[0])
# calculate the initial weights and the means
left, right = np.hsplit(hist[0],[0])
left_bins, right_bins = np.hsplit(hist[1],[0])
# left weights
w_0 = 0.0
# Right weights
w_1 = np.sum(right)/total
# Left mean
mean_0 = 0.0
weighted_sum_0 = 0.0
# Right mean
weighted_sum_1 = np.dot(right,right_bins[:-1])
mean_1 = weighted_sum_1/np.sum(right)
def recursive_otsu1(hist, w_0=w_0, w_1=w_1, weighted_sum_0=weighted_sum_0, weighted_sum_1=weighted_sum_1, thres=1, fn_max=-np.inf, thresh=0, total=total):
    if thres<=255:
        # To pass the division by zero warning
        if np.sum(hist[0][:thres+1]) !=0 and np.sum(hist[0][thres+1:]) !=0:
            # Update the weights
            w_0 += hist[0][thres]/total
            w_1 -= hist[0][thres]/total
            # Update the mean
            weighted_sum_0 += (hist[0][thres]*hist[1][thres])
            mean_0 = weighted_sum_0/np.sum(hist[0][:thres+1])
            weighted_sum_1 -= (hist[0][thres]*hist[1][thres])
            if thres == 255:
                mean_1 = 0.0
            else:
                mean_1 = weighted_sum_1/np.sum(hist[0][thres+1:])
            # Calculate the between-class variance
            out = w_0*w_1*((mean_0-mean_1)**2)
            # # if variance maximum, update it
            if out>fn_max:
                fn_max = out
                thresh = thres
        return recursive_otsu1(hist, w_0=w_0, w_1=w_1, weighted_sum_0=weighted_sum_0, weighted_sum_1=weighted_sum_1, thres=thres+1, fn_max=fn_max, thresh=thresh, total=total)
    # Stopping condition
    else:
        return fn_max,thresh
    
    
# Check the results
var_value, thresh_value = recursive_otsu1(hist, w_0=w_0, w_1=w_1, weighted_sum_0=weighted_sum_0, weighted_sum_1=weighted_sum_1, thres=1, fn_max=-np.inf, thresh=0, total=total)
print(var_value, thresh_value)

# Calculate the histogram

hist = plt.hist(img1.ravel(),256,[0,256])

# Total pixels in the image

total = np.sum(hist[0])

# calculate the initial weights and the means

left, right = np.hsplit(hist[0],[0])

left_bins, right_bins = np.hsplit(hist[1],[0])

# left weights

w_0 = 0.0

# Right weights

w_1 = np.sum(right)/total

# Left mean

mean_0 = 0.0

weighted_sum_0 = 0.0

# Right mean

weighted_sum_1 = np.dot(right,right_bins[:-1])

mean_1 = weighted_sum_1/np.sum(right)

def recursive_otsu1(hist, w_0=w_0, w_1=w_1, weighted_sum_0=weighted_sum_0, weighted_sum_1=weighted_sum_1, thres=1, fn_max=-np.inf, thresh=0, total=total):

if thres<=255:

# To pass the division by zero warning

if np.sum(hist[0][:thres+1]) !=0 and np.sum(hist[0][thres+1:]) !=0:

# Update the weights

w_0 += hist[0][thres]/total

w_1 -= hist[0][thres]/total

# Update the mean

weighted_sum_0 += (hist[0][thres]*hist[1][thres])

mean_0 = weighted_sum_0/np.sum(hist[0][:thres+1])

weighted_sum_1 -= (hist[0][thres]*hist[1][thres])

if thres == 255:

mean_1 = 0.0

else:

mean_1 = weighted_sum_1/np.sum(hist[0][thres+1:])

# Calculate the between-class variance

out = w_0*w_1*((mean_0-mean_1)**2)

# # if variance maximum, update it

if out>fn_max:

fn_max = out

thresh = thres

return recursive_otsu1(hist, w_0=w_0, w_1=w_1, weighted_sum_0=weighted_sum_0, weighted_sum_1=weighted_sum_1, thres=thres+1, fn_max=fn_max, thresh=thresh, total=total)

# Stopping condition

else:

return fn_max,thresh

# Check the results

var_value, thresh_value = recursive_otsu1(hist, w_0=w_0, w_1=w_1, weighted_sum_0=weighted_sum_0, weighted_sum_1=weighted_sum_1, thres=1, fn_max=-np.inf, thresh=0, total=total)

print(var_value, thresh_value)

This is how you can implement otsu’s method recursively if you consider maximizing between-class variance. Now, let’s discuss what are the limitations of this method.

Limitations

Otsu’s method is only guaranteed to work when

The histogram should be bimodal.
Reasonable contrast ratio exists between the background and the roi.
Uniform lighting conditions are there.
Image is not affected by noise.
Size of the background and the roi should be comparable.

There are many modifications done to the original Otsu’s algorithm to address these limitations such as two-dimensional Otsu’s method etc. We will discuss some of these modifications in the following blogs.

In the following blogs, we will also discuss how to counter these limitations so as to get satisfactory results with otsu’s method. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Improving Global Thresholding

Leave a reply

In the previous blog, we discussed otsu’s method for automatic image thresholding. Then we also discussed the limitations of the otsu’s method. In this blog, we will discuss how to handle these limitations so as to produce satisfactory thresholding results. So, let’s get started.

Case-1: When the noise is present in the image

If the noise is present in the image, then this tends to change the modality of the histogram. The sharp valleys between the peaks of the bimodal histogram start degrading. In that case, the otsu’s method or any other global thresholding method will fail. So, in order to find the global threshold, one should first remove the noise using any smoothing filters like Gaussian, etc. and then apply any automatic thresholding method like otsu, etc.

Case-2: When the object area is small compared to the background area

In this case, the image histogram will be dominated by a large background area. This will increase the probability of any pixel belonging to the background. So, the histogram will no longer exhibit bimodality and thus otsu will result in segmentation error. To prevent this, one should only consider pixels that lie on or near the edges between the objects and the background. Doing so will result in an image histogram with peaks of approximately the same size. Then we can apply any automatic thresholding method like otsu, etc. Below are the steps to implement the above procedure.

Calculate the edge image using any high pass filter like Sobel, Laplacian, etc.
Select any threshold value (T).
Threshold the above edge image to produce a binary mask.
Apply the mask image on the input image using any bitwise operations or any other method.
This results in only those pixels where the mask image was white.
Compute the histogram of only those pixels
Finally, apply any automatic global thresholding method like otsu, etc.

Case-3: When the image is taken under non-uniform illumination conditions

In this case, the histogram no longer remains bimodal and thus we will not be able to segment the image satisfactorily. One of the simplest approaches is to subdivide the image into non-overlapping images/rectangles. The size of these rectangles is chosen such that the illumination is nearly constant in each of these rectangles. Then we will apply any global thresholding technique like otsu for each of these rectangles.

The above procedure only works when the size of the object and the background are comparable in the rectangle. This is quite intuitive as only then we will have a bimodal histogram. Taking care of the background and the object sizes in each rectangle is a tedious task.

So, in the next blog, we will discuss adaptive thresholding that works pretty well for the above conditions. That’s all for this blog. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Global Thresholding

Leave a reply

In the previous blog, we discussed image thresholding and when to use this for image segmentation. We also learned that thresholding can be global or adaptive depending upon how the threshold value is selected.

In this blog, we will discuss

global thresholding
OpenCV function for global thresholding
How to choose threshold value using the iterative algorithm

In global thresholding, each pixel value in the image is compared with a single (global) threshold value. Below is the code for this.

def global_threshold(image, thres_value, val_high, val_low):
    img = image.copy()
    for i in range(image.shape[0]):
        for j in range(image.shape[1]):
            if image[i,j] > thres_value:
                img[i,j] = val_high
            else:
                img[i,j] = val_low
    return img

def global_threshold(image, thres_value, val_high, val_low):

img = image.copy()

for i in range(image.shape[0]):

for j in range(image.shape[1]):

if image[i,j] > thres_value:

img[i,j] = val_high

else:

img[i,j] = val_low

return img

Here, we assign a value of “val_high” to all the pixels greater than the threshold otherwise “val_low“. OpenCV also provides a builtin function for thresholding the image. So, let’s take a look at that function.

OpenCV

cv2.threshold(src, thresh, maxval, type) → retval, dst

1	cv2.threshold(src, thresh, maxval, type) → retval, dst

This function returns the thresholded image(dst) and the threshold value(retval). Its arguments are

src: input greyscale image (8-bit or 32-bit floating point)
thresh: global threshold value
type: Different types that decide “val_high” and “val_low“. In other words, these types decide what value to assign for pixels greater than and less than the threshold. Below figure shows different thresholding types available.
maxval: maximum value to be used with THRESH_BINARY and THRESH_BINARY_INV. Check the below image.

To specify the thresholding type, write “cv2.” as the prefix. For instance, write cv2.THRESH_BINARY if you want to use this type. Let’s take an example

import cv2
# Load an image in the greyscale
img = cv2.imread('D:/downloads/opencv_logo1.PNG',cv2.IMREAD_GRAYSCALE)
# threshold the image
ret, thresh = cv2.threshold(img,75,255,cv2.THRESH_BINARY)

import cv2

# Load an image in the greyscale

img = cv2.imread('D:/downloads/opencv_logo1.PNG',cv2.IMREAD_GRAYSCALE)

# threshold the image

ret, thresh = cv2.threshold(img,75,255,cv2.THRESH_BINARY)

Similarly, you can apply other thresholding types to check how they work. Till now we discussed how to threshold an image using a global threshold value. But we didn’t discuss how to get this threshold value. So, in the next section, let’s discuss this.

How to choose the threshold value?

As already discussed, that global thresholding is a suitable approach only when intensity distributions of the background and the ROI are sufficiently distinct. In other words, there is a clear valley between the peaks of the histogram. We can easily select the threshold value in that situation. But what if we have a number of images. In that case, we don’t manually want to first check the image histogram and then deciding the threshold value. We want something that can automatically estimate the threshold value for each image. Below is the algorithm that can be used for this purpose.

Below is the code for the above algorithm.

def thres_finder(img, thres=20,delta_T=1.0):
    
    # Step-2: Divide the images in two parts
    x_low, y_low = np.where(img<=thres)
    x_high, y_high = np.where(img>thres)
    
    # Step-3: Find the mean of two parts
    mean_low = np.mean(img[x_low,y_low])
    mean_high = np.mean(img[x_high,y_high])
    
    # Step-4: Calculate the new threshold
    new_thres = (mean_low + mean_high)/2
    
    # Step-5: Stopping criteria, otherwise iterate
    if abs(new_thres-thres)< delta_T:
        return new_thres
    else:
        return thres_finder(img, thres=new_thres,delta_T=1.0)

def thres_finder(img, thres=20,delta_T=1.0):

# Step-2: Divide the images in two parts

x_low, y_low = np.where(img<=thres)

x_high, y_high = np.where(img>thres)

# Step-3: Find the mean of two parts

mean_low = np.mean(img[x_low,y_low])

mean_high = np.mean(img[x_high,y_high])

# Step-4: Calculate the new threshold

new_thres = (mean_low + mean_high)/2

# Step-5: Stopping criteria, otherwise iterate

if abs(new_thres-thres)< delta_T:

return new_thres

else:

return thres_finder(img, thres=new_thres,delta_T=1.0)

Now, let’s take an example to check how’s this working.

import cv2
# Load an image in the greyscale
img = cv2.imread('D:/downloads/opencv_logo1.PNG',cv2.IMREAD_GRAYSCALE)
# apply threshold finder
vv1 = thres_finder1(img, thres=30,delta_T=1.0)
# threshold the image
ret, thresh = cv2.threshold(img,vv1,255,cv2.THRESH_BINARY)
# Display the image side by side
out = cv2.hconcat([img,thresh])
cv2.imshow('threshold',out)
cv2.waitKey(0)

import cv2

# Load an image in the greyscale

img = cv2.imread('D:/downloads/opencv_logo1.PNG',cv2.IMREAD_GRAYSCALE)

# apply threshold finder

vv1 = thres_finder1(img, thres=30,delta_T=1.0)

# threshold the image

ret, thresh = cv2.threshold(img,vv1,255,cv2.THRESH_BINARY)

# Display the image side by side

out = cv2.hconcat([img,thresh])

cv2.imshow('threshold',out)

cv2.waitKey(0)

That’s all for this blog. In the next blog, we will discuss how to perform optimum global thresholding using Otsu’s method. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Image Thresholding

Leave a reply

Image Segmentation is the process of subdividing an image into its constituent regions or objects. In many computer vision applications, image segmentation is very useful to detect the region of interest. For instance, in medical imaging where we have to locate tumors, or in object detection like self-driving cars have to detect pedestrians, traffic signals, etc or for video surveillance, etc. There are a number of methods available to perform image segmentation. For instance, thresholding, clustering methods, graph partitioning methods, and convolutional methods to mention a few.

In this blog, we will discuss Image Thresholding which is one of the simplest methods for image segmentation. In this, we partition the images directly into regions based on the intensity values. So, let’s discuss image thresholding in greater detail.

Concept

If the pixel value is greater than a threshold value, it is assigned one value (maybe white), else it is assigned another value (maybe black).

In other words, if f(x,y) is the input image then the segmented image g(x,y) is given by

If the threshold value T remains constant over the entire image, then this is known as global thresholding. When the value of T changes over the entire image or depends upon the pixel neighborhood, then this is known as adaptive thresholding. We will cover both these types in greater detail in the following blogs.

Applicability Condition

Thresholding is only guaranteed to work when a good contrast ratio between the region of interest and the background exists. Otherwise, the thresholding will not be able to fully detect the region of interest. Let’s understand this by an example.

Suppose we have two images from which we want to segment the square region (our region of interest) from the background.

Let’s plot the histogram of these two images.

Clearly as expected for “A“, the histogram is showing two peaks corresponding to the square and the background. The separation between the peaks shows that the background and ROI have a good contrast ratio. By choosing a threshold value between the peaks, we will be able to segment out the ROI. While for “B”, the intensity distribution of the ROI and the background is not that distinct. Thus we may not be able to fully segment the ROI.

Thresholded images are shown below (How to choose a threshold value will be discussed in the next blog).

So, always plot the image histogram to check the contrast ratio between the background and the ROI. Only if the contrast ratio is good, choose the thresholding method for image segmentation. Otherwise, look for other methods.

In the next blog, we will discuss global thresholding and how to choose the threshold value using the iterative method. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Multi-Class Classification

Leave a reply

In the previous blog, we discussed the binary classification problem where each image can contain only one class out of two classes. So, in this blog, we will extend this to the multi-class classification problem. In multi-class problem, we classify each image into one of three or more classes. So, let’s get started.

Here, we will use the CIFAR-10 dataset, developed by the Canadian Institute for Advanced Research (CIFAR). The CIFAR-10 dataset consists of 60000 (32×32) color images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. The classes are completely mutually exclusive. Below are the classes in the dataset, as well as 10 random images from each class.

1. Load the Data

CIFAR-10 dataset can be downloaded by using any of the two methods:

Using Keras builtin datasets
From the official website

Method-1

Downloading using the Keras builtin datasets is pretty straightforward and simple. It’s already transformed into the shape appropriate for the CNN input. No headache, just write one line of code and you are done.

(train_x, train_y), (X_test, y_test) = cifar10.load_data()

1	(train_x, train_y), (X_test, y_test) = cifar10.load_data()

Method-2

The data can also be downloaded from the official website. But the only thing is that it is not in the standard format that can be inputted directly to the model. Let’s see how the dataset is arranged.

The dataset is broken into 5 files so as to prevent your machine from running out of memory. Each file contains a dictionary of data and the corresponding labels. Data is a 10000×3072 array where 10000 is the number of images and 3072 are the pixel values in row-major order. So, the first 1024 entries contain the red channel values, the next 1024 the green, and the final 1024 the blue. You need to convert it into a (32,32) color image.

Steps:

First, unpickle all the train and test files
Then convert the image format to (width x height x num_channel)

# Code for unpickling the train and test files
def unpickle(file):
    import pickle
    with open(file, 'rb') as fo:
        dict = pickle.load(fo, encoding='bytes')
    return dict

# Unpickle all the train and test files
train1 = unpickle('D:/downloads/CV/cifar10/cifar-10-batches-py/data_batch_1')
train2 = unpickle('D:/downloads/CV/cifar10/cifar-10-batches-py/data_batch_2')
train3 = unpickle('D:/downloads/CV/cifar10/cifar-10-batches-py/data_batch_3')
train4 = unpickle('D:/downloads/CV/cifar10/cifar-10-batches-py/data_batch_4')
train5 = unpickle('D:/downloads/CV/cifar10/cifar-10-batches-py/data_batch_5')

# Code for unpickling the train and test files

def unpickle(file):

import pickle

with open(file, 'rb') as fo:

dict = pickle.load(fo, encoding='bytes')

return dict

# Unpickle all the train and test files

train1 = unpickle('D:/downloads/CV/cifar10/cifar-10-batches-py/data_batch_1')

train2 = unpickle('D:/downloads/CV/cifar10/cifar-10-batches-py/data_batch_2')

train3 = unpickle('D:/downloads/CV/cifar10/cifar-10-batches-py/data_batch_3')

train4 = unpickle('D:/downloads/CV/cifar10/cifar-10-batches-py/data_batch_4')

train5 = unpickle('D:/downloads/CV/cifar10/cifar-10-batches-py/data_batch_5')

Then append all the unpickled train files into one array.

train_x = np.concatenate((train1[b'data'], train2[b'data'], train3[b'data'], train4[b'data'], train5[b'data']),axis=0)
train_y = np.concatenate((train1[b'labels'], train2[b'labels'], train3[b'labels'], train4[b'labels'], train5[b'labels']),axis=0)

# convert the image format to (width x height x num_channel)
b = np.reshape(train_x,(50000,3,32,32))
train_x = np.transpose(b,(0,2,3,1))

# Normalize the data between 0 and 1
train_x = train_x.astype('float32')/255
train_y = np.expand_dims(train_y,axis=-1)

train_x = np.concatenate((train1[b'data'], train2[b'data'], train3[b'data'], train4[b'data'], train5[b'data']),axis=0)

train_y = np.concatenate((train1[b'labels'], train2[b'labels'], train3[b'labels'], train4[b'labels'], train5[b'labels']),axis=0)

# convert the image format to (width x height x num_channel)

b = np.reshape(train_x,(50000,3,32,32))

train_x = np.transpose(b,(0,2,3,1))

# Normalize the data between 0 and 1

train_x = train_x.astype('float32')/255

train_y = np.expand_dims(train_y,axis=-1)

Split the data into train and validation

Because the training data contains images in the random order thus simple splitting will be sufficient. Another way is to take some % of images from each of the 5 train files to constitute a validation set.

x_train = train_x[:45000]
y_train = train_y[:45000]
val_x = train_x[45000:]
val_y = train_y[45000:]

x_train = train_x[:45000]

y_train = train_y[:45000]

val_x = train_x[45000:]

val_y = train_y[45000:]

To make sure that this splitting leads to the uniform proportion of examples for each class, we can plot the counts of each class in the validation dataset. Below is the bar plot. Looks like all the classes are uniformly distributed in the validation set.

Model Architecture

Since the images contain a diverse amount of information, we will be needing a bigger network. Bigger the network more will be the chances of overfitting, So, to prevent this we may need to apply some regularization techniques.

model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', padding='same', input_shape=(32, 32, 3)))
model.add(BatchNormalization())
model.add(Conv2D(32, (3, 3), activation='relu', padding='same'))
model.add(BatchNormalization())
model.add(MaxPool2D((2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(BatchNormalization())
model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(BatchNormalization())
model.add(MaxPool2D((2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))
model.add(BatchNormalization())
model.add(MaxPool2D((2, 2)))
model.add(Dropout(0.4))
model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dense(10, activation='softmax'))

model.compile(loss='sparse_categorical_crossentropy',optimizer='Adam',metrics=['accuracy'])

model = Sequential()

model.add(Conv2D(32, (3, 3), activation='relu', padding='same', input_shape=(32, 32, 3)))

model.add(BatchNormalization())

model.add(Conv2D(32, (3, 3), activation='relu', padding='same'))

model.add(BatchNormalization())

model.add(MaxPool2D((2, 2)))

model.add(Dropout(0.25))

model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))

model.add(BatchNormalization())

model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))

model.add(BatchNormalization())

model.add(MaxPool2D((2, 2)))

model.add(Dropout(0.25))

model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))

model.add(BatchNormalization())

model.add(MaxPool2D((2, 2)))

model.add(Dropout(0.4))

model.add(Flatten())

model.add(Dense(256, activation='relu'))

model.add(Dense(10, activation='softmax'))

model.compile(loss='sparse_categorical_crossentropy',optimizer='Adam',metrics=['accuracy'])

Data Augmentation

from keras.preprocessing.image import ImageDataGenerator
train_datagen = ImageDataGenerator(width_shift_range=0.1,
                                    height_shift_range=0.1,
                                    horizontal_flip=True,
                                  )

from keras.preprocessing.image import ImageDataGenerator

train_datagen = ImageDataGenerator(width_shift_range=0.1,

height_shift_range=0.1,

horizontal_flip=True,

)

Fit the model using the fit_generator

train_generator = train_datagen.flow(x_train, y_train, batch_size=100,shuffle=True)
history = model.fit_generator(train_generator, steps_per_epoch=450, validation_data=(val_x, val_y),epochs=20)

1 2	train_generator = train_datagen.flow(x_train, y_train, batch_size=100,shuffle=True) history = model.fit_generator(train_generator, steps_per_epoch=450, validation_data=(val_x, val_y),epochs=20)

Let’s visualize the training events using the History() callback.

train_acc = history.history['acc']
validation_acc = history.history['val_acc']
epochs = range(1,21)
plt.plot(epochs,train_acc,'bo',label='Training Accuracy')
plt.plot(epochs,validation_acc,'b',label='Validation Accuracy')
plt.title('Training and Validation Accuracy')
plt.legend()

train_acc = history.history['acc']

validation_acc = history.history['val_acc']

epochs = range(1,21)

plt.plot(epochs,train_acc,'bo',label='Training Accuracy')

plt.plot(epochs,validation_acc,'b',label='Validation Accuracy')

plt.title('Training and Validation Accuracy')

plt.legend()

That seems pretty nice. You can play with the architecture, optimizers and other hyperparameters to obtain even more accuracy. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Multi-Label Classification

1 Reply

In the previous blogs, we discussed binary and multi-class classification problems. Both of these are almost similar. The basic assumption underlying these two problems is that each image can contain only one class. For instance, for the dogs vs cats classification, it was assumed that the image can contain either cat or dog but not both. So, in this blog, we will discuss the case where more than one classes can be present in a single image. This type of classification is known as Multi-label classification. Below picture explains this concept beautifully.

Some of the most common techniques for solving multi-label classification problems are

Problem Transformation
Adapted Algorithm
Ensemble approaches

Here, we will only discuss only Binary Relevance, a method that falls under the Problem Transformation category. If you are curious about other methods, you can read this amazing review paper.

In binary relevance, we try to break the problem into a number of binary classification problems. So, now for each class available, we will ask if it is present in the image or not. As we already know that the binary classification uses ‘sigmoid‘ as the last layer activation function and ‘binary_crossentropy‘ as the loss function. So, here we will also use the same. Rest all things are the same.

Now, let’s take a dataset and see how to implement multi-label classification.

Problem Definition

Here, we will take the most common Movie Genre classification based on the poster images problem. Because a movie can belong to more than one genre, for instance, comedy, romance, etc. and hence is a multi-label classification problem.

Dataset

You can download the original dataset from here. This contains two files.

Movie_Poster_Dataset.zip – The poster images
Movie_Poster_Metadata.zip – Metadata of each poster image like ID, genres, box office, etc.

To prepare the dataset, we need images and corresponding genre information. For this, we need to extract the genre information from the Movie_Poster_Metadata.zip file corresponding to each poster image. Let’s see how to do this.

Note: This dataset contains some missing items. For instance, check the “1982” folder in the Movie_Poster_Dataset.zip and Movie_Poster_Metadata.zip. The number of poster images and the corresponding genre information is missing for some movies. So, we need to perform EDA and remove these files.

Steps to perform EDA:

First, we will extract the movie name and corresponding genre information from the Movie_Poster_Metadata.zip file and create a Pandas dataframe using these.
Then we will loop over the poster images in the Movie_Poster_Dataset.zip file and check if it is present in the dataframe created above. If the poster is not present, we will remove that movie from the dataframe.

These two steps will ensure that we are only left with movies that have poster images and genre information. Below is the code for this.

Because the encoding of some files is different, that’s why 2 for loops. Below are the steps performed in the code.

First, open the metadata file
Read line by line
Extract the information corresponding to the ‘Genre’ and ‘imdbID’
Append them into the list and create a dataframe

b = []
b2 = []
for i in range(1980,1982):
    with open('D:/downloads/Movie_Poster_Dataset/groundtruth/{}.txt'.format(i), mode="r") as f:
        for lines in f.readlines():
            lines = lines.rstrip('\n')
            if 'imdbID' in lines:
                a2,b3,c2 = lines.partition(':')
                c2 = c2.lstrip('  "')
                c2 = c2.rstrip('",\n')
                b2.append(c2+'.jpg')

            if "Genre" in lines:
#                 print(lines)
                a,b1,c = lines.partition(':')
                c = c.lstrip('  "')
                c = c.rstrip('",\n')
                c1 = c.split(',')
                c1 = map(str.strip, c1) 
                b.append(list(c1))
        f.close()
        
for i in range(1982,2016):
    with open('D:/downloads/Movie_Poster_Dataset/groundtruth/{}.txt'.format(i), mode="r",encoding='utf-16-le') as f:
        for lines in f.readlines():
            lines = lines.rstrip('\n')
            if 'imdbID' in lines:
                a2,b3,c2 = lines.partition(':')
                c2 = c2.lstrip('  "')
                c2 = c2.rstrip('",\n')
                b2.append(c2+'.jpg')                
            if "Genre" in lines:
                a,b1,c = lines.partition(':')
                c = c.lstrip('  "')
                c = c.rstrip('",\n')
                c1 = c.split(',')
                c1 = map(str.strip, c1) 
                b.append(list(c1))
        f.close()

data = pd.DataFrame({'name':b2,'filename':b})

b = []

b2 = []

for i in range(1980,1982):

with open('D:/downloads/Movie_Poster_Dataset/groundtruth/{}.txt'.format(i), mode="r") as f:

for lines in f.readlines():

lines = lines.rstrip('\n')

if 'imdbID' in lines:

a2,b3,c2 = lines.partition(':')

c2 = c2.lstrip(' "')

c2 = c2.rstrip('",\n')

b2.append(c2+'.jpg')

if "Genre" in lines:

# print(lines)

a,b1,c = lines.partition(':')

c = c.lstrip(' "')

c = c.rstrip('",\n')

c1 = c.split(',')

c1 = map(str.strip, c1)

b.append(list(c1))

f.close()

for i in range(1982,2016):

with open('D:/downloads/Movie_Poster_Dataset/groundtruth/{}.txt'.format(i), mode="r",encoding='utf-16-le') as f:

for lines in f.readlines():

lines = lines.rstrip('\n')

if 'imdbID' in lines:

a2,b3,c2 = lines.partition(':')

c2 = c2.lstrip(' "')

c2 = c2.rstrip('",\n')

b2.append(c2+'.jpg')

if "Genre" in lines:

a,b1,c = lines.partition(':')

c = c.lstrip(' "')

c = c.rstrip('",\n')

c1 = c.split(',')

c1 = map(str.strip, c1)

b.append(list(c1))

f.close()

data = pd.DataFrame({'name':b2,'filename':b})

Now for the second step, we first append all the poster images filenames in the list.

q=[]
for i in range(1980,2016):
    for files in os.listdir('D:/downloads/Movie_Poster_Dataset/Movie_Poster_Dataset/{}'.format(i)):
        q.append(files)

q=[]

for i in range(1980,2016):

for files in os.listdir('D:/downloads/Movie_Poster_Dataset/Movie_Poster_Dataset/{}'.format(i)):

q.append(files)

Then check if the name is present in the dataframe or not. If not, we will remove the rows from the dataframe or create a new dataframe.

new = list(set(data['name']).intersection(q))
data2 = data[data['name'].isin(new)]

1 2	new = list(set(data['name']).intersection(q)) data2 = data[data['name'].isin(new)]

Be sure that we have no duplicates in the dataframe.

data2.drop_duplicates(subset= 'name',keep='first', inplace=True)

1	data2.drop_duplicates(subset= 'name',keep='first', inplace=True)

So, finally, we are ready with our cleaned dataset with 8052 images containing overall 25 classes. The dataframe is shown below.

One can also convert this dataframe into the common format as shown below

This can be done using the following code.

for idx, row in df.iterrows():
    for hobby in row.filename:
        df.loc[idx, hobby] = '1'

df.fillna('0', inplace=True)

for idx, row in df.iterrows():

for hobby in row.filename:

df.loc[idx, hobby] = '1'

df.fillna('0', inplace=True)

In this post, we will be using Format 1. You can use any. Here, we will be using the Keras flow_from_dataframe method. For this, we need to place all the images under one directory. Currently, all the images are in separate folders such as 1980, 1981, etc. Below is the code that places all the poster images in a single folder ‘original_train‘.

original_train = 'D:/downloads/Movie_Poster_Dataset/Data/'
for i in range(1980,2016):
    for files in os.listdir('D:/downloads/Movie_Poster_Dataset/Movie_Poster_Dataset/{}'.format(i)):
        src = os.path.join('D:/downloads/Movie_Poster_Dataset/Movie_Poster_Dataset/{}'.format(i),files)
        out = os.path.join(original_train,files)
        shutil.copy(src,out)

original_train = 'D:/downloads/Movie_Poster_Dataset/Data/'

for i in range(1980,2016):

for files in os.listdir('D:/downloads/Movie_Poster_Dataset/Movie_Poster_Dataset/{}'.format(i)):

src = os.path.join('D:/downloads/Movie_Poster_Dataset/Movie_Poster_Dataset/{}'.format(i),files)

out = os.path.join(original_train,files)

shutil.copy(src,out)

Model Architecture

model = Sequential()
model.add(Conv2D(16,(3,3),activation='relu',input_shape=(400,300,3)))
model.add(Dropout(0.25))
model.add(Conv2D(16,(3,3),activation='relu'))
model.add(MaxPool2D((2,2)))
model.add(Dropout(0.5))
model.add(Conv2D(32,(3,3),activation='relu'))
model.add(Conv2D(32,(3,3),activation='relu'))
model.add(Dropout(0.5))
model.add(MaxPool2D((2,2)))
model.add(Conv2D(64,(3,3),activation='relu'))
model.add(MaxPool2D((2,2)))
model.add(Dropout(0.5))
model.add(Conv2D(128,(3,3),activation='relu'))
model.add(Conv2D(64,(1,1),activation='relu'))
model.add(MaxPool2D(pool_size=(4,4)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(512,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(25, activation='sigmoid'))

model = Sequential()

model.add(Conv2D(16,(3,3),activation='relu',input_shape=(400,300,3)))

model.add(Dropout(0.25))

model.add(Conv2D(16,(3,3),activation='relu'))

model.add(MaxPool2D((2,2)))

model.add(Dropout(0.5))

model.add(Conv2D(32,(3,3),activation='relu'))

model.add(Dropout(0.5))

model.add(MaxPool2D((2,2)))

model.add(Conv2D(64,(3,3),activation='relu'))

model.add(MaxPool2D((2,2)))

model.add(Dropout(0.5))

model.add(Conv2D(128,(3,3),activation='relu'))

model.add(Conv2D(64,(1,1),activation='relu'))

model.add(MaxPool2D(pool_size=(4,4)))

model.add(Dropout(0.25))

model.add(Flatten())

model.add(Dense(512,activation='relu'))

model.add(Dropout(0.5))

model.add(Dense(25, activation='sigmoid'))

Since this is a sparse multilabel classification problem, accuracy is not a good metric for this. The reason for this is shown below.

if the predicted output was [0, 0, 0, 0, 0, 1] and the correct output was [0, 0, 0, 0, 0, 0], my accuracy would still be 5/6.

So, you can use other metrics like precision, recall, f1 score, hamming loss, top_k_categorical_accuracy, etc.

model.compile(optimizer='Adam',loss="binary_crossentropy",metrics=["accuracy",'top_k_categorical_accuracy'])

1	model.compile(optimizer='Adam',loss="binary_crossentropy",metrics=["accuracy",'top_k_categorical_accuracy'])

Here, I’ve used both to show how accuracy instantly reaches 90+ from the starting epoch and thus is not a correct metric.

flow_from_dataframe()

Here, I split the data into training and validation sets using the validation_split argument of ImageDataGenerator. You can read more about the ImageDataGenerator here.

datagen = ImageDataGenerator(rescale=1/255., validation_split=0.1)
 
train_generator = datagen.flow_from_dataframe(dataframe=data2, directory=original_train,
                                             x_col='name',
                                             y_col='filename',
                                             target_size=(400,300),
                                             color_mode='rgb',
                                             class_mode='categorical',
                                             batch_size=30,
                                             shuffle=True,
                                             subset='training',
                                             seed=7)
 
validation_generator = datagen.flow_from_dataframe(dataframe=data2, directory=original_train,
                                             x_col='name',
                                             y_col='filename',
                                             target_size=(400,300),
                                             color_mode='rgb',
                                             class_mode='categorical',
                                             batch_size=35,
                                             shuffle=False,
                                             subset='validation',
                                             seed=7)

datagen = ImageDataGenerator(rescale=1/255., validation_split=0.1)

train_generator = datagen.flow_from_dataframe(dataframe=data2, directory=original_train,

x_col='name',

y_col='filename',

target_size=(400,300),

color_mode='rgb',

class_mode='categorical',

batch_size=30,

shuffle=True,

subset='training',

seed=7)

validation_generator = datagen.flow_from_dataframe(dataframe=data2, directory=original_train,

x_col='name',

y_col='filename',

target_size=(400,300),

color_mode='rgb',

class_mode='categorical',

batch_size=35,

shuffle=False,

subset='validation',

seed=7)

Below are some of the poster images all resized into (400,300,3).

plt.figure(figsize=(10,5))
for i in range(6):
    plt.subplot(2,3,i+1)
    for x,y in train_generator:
        print(x.shape)
        plt.imshow(x[0])
        plt.xticks([])
        plt.yticks([])
        break
plt.tight_layout()
plt.show()

plt.figure(figsize=(10,5))

for i in range(6):

plt.subplot(2,3,i+1)

for x,y in train_generator:

print(x.shape)

plt.imshow(x[0])

plt.xticks([])

plt.yticks([])

break

plt.tight_layout()

plt.show()

You can also check which labels are assigned to which class using the following code.

train_generator.class_indices

1	train_generator.class_indices

This prints a dictionary containing class names as keys and labels as values.

Let’s start training…

train_steps = train_generator.n//train_generator.batch_size
validation_steps = validation_generator.n//validation_generator.batch_size
 
history = model.fit_generator(train_generator,steps_per_epoch=train_steps, epochs=10,
                              validation_data=validation_generator,validation_steps=validation_steps)

train_steps = train_generator.n//train_generator.batch_size

validation_steps = validation_generator.n//validation_generator.batch_size

history = model.fit_generator(train_generator,steps_per_epoch=train_steps, epochs=10,

validation_data=validation_generator,validation_steps=validation_steps)

See how accuracy is reaching 90+ within few epochs. As stated earlier this is not a good evaluation metric for multi-label classification. On the other hand, top_k_categorical_accuracy is showing us the true picture.

Clearly, we are doing a pretty decent job. Considering the fact that training data is small and the complexity of the problem is large(25 classes). Moreover, some classes like comedy, etc dominate the training data. Play with the model architecture and other hyperparameters and check how the accuracy varies.

Prediction time

For each image, let’s predict the top three predicted classes. Below is the code for this.

img = load_img('D:/downloads/Movie_Poster_Dataset/Data/tt0080854.jpg')
img1 = img_to_array(img)
img2 = cv2.resize(img1,(300,400))
img2 = img2/255
img3 = np.expand_dims(img2,axis=0)
proba = model.predict(img3)
top_3 = np.argsort(proba[0])[:-4:-1]
for i in range(3):
    print("{}".format(list(train_generator.class_indices.keys())[top_3[i]])+" ({:.3})".format(proba[0][top_3[i]]))
plt.imshow(img)

img = load_img('D:/downloads/Movie_Poster_Dataset/Data/tt0080854.jpg')

img1 = img_to_array(img)

img2 = cv2.resize(img1,(300,400))

img2 = img2/255

img3 = np.expand_dims(img2,axis=0)

proba = model.predict(img3)

top_3 = np.argsort(proba[0])[:-4:-1]

for i in range(3):

print("{}".format(list(train_generator.class_indices.keys())[top_3[i]])+" ({:.3})".format(proba[0][top_3[i]]))

plt.imshow(img)

The actual label for this can be found out as

data2[data2['name']=='tt0080854.jpg']

1	data2[data2['name']=='tt0080854.jpg']

You can see that our model is doing a decent job considering the complexity of the problem

Let’s try another example “tt0465602.jpg“. For this the predicted labels are

By looking at the poster most of us will predict the labels as predicted by our algorithm. Actually, these are pretty close to the true labels that are [Action, Comedy, Crime].

That’s all for multi-label classification problem. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Keras Callbacks – History

Leave a reply

In neural networks, the best idea for debugging is to see the relationship between the cost and the number of iterations. This not only ensures that the optimizer is working properly but can also be very useful in the indication of overfitting. Moreover, we can also debug the learning rate based on this relationship. Thus, one should always keep a track on the loss and the accuracy metrics while training a neural network.

Fortunately, in Keras, we don’t need to write a single extra line of code to store all these values. Keras automatically keeps the record of all the events for each epoch. This includes loss and accuracy metrics for both training and validation sets (if used). This is done using the History callback which is automatically applied to every Keras model. This callback records all the events into a History object that gets returned by the fit() method.

How does this work?

First, at the onset of training, this creates an empty dictionary to store all the events. Then at every epoch end, all the events are appended into the dictionary. Below is the code for this taken from the Keras GitHub.

def on_train_begin(self, logs=None):
    self.epoch = []
    self.history = {}

def on_epoch_end(self, epoch, logs=None):
    logs = logs or {}
    self.epoch.append(epoch)
    for k, v in logs.items():
        self.history.setdefault(k, []).append(v)

def on_train_begin(self, logs=None):

self.epoch = []

self.history = {}

def on_epoch_end(self, epoch, logs=None):

logs = logs or {}

self.epoch.append(epoch)

for k, v in logs.items():

self.history.setdefault(k, []).append(v)

How to use this?

Since all the saved records are returned by the fit() method, we can simply store all the events in any variable. Here, I’ve used “record” as the variable name.

record = model.fit(train_x, train_y,validation_split=0.20, epochs=5, batch_size=128)

1	record = model.fit(train_x, train_y,validation_split=0.20, epochs=5, batch_size=128)

Now, using this record object, we can retrieve any information about the training process. For instance, “record.epoch” returns the list of epochs.

record.epoch
>>> [0, 1, 2, 3, 4]

1 2	record.epoch >>> [0, 1, 2, 3, 4]

“record.history” returns the dictionary containing the event names as the dictionary keys and their values at each epoch in a list.

record.history
>>> {'val_loss': [0.037735904552973806,
  0.0383092053757185,
  0.03597573842055863,
  0.04122187125845812,
  0.04817074624549908],
 'val_acc': [0.9889166665077209,
  0.9899166665077209,
  0.9912499998410543,
  0.989416666507721,
  0.988666666507721],
 'loss': [0.01413803466820779,
  0.010153079537209123,
  0.008668457060974712,
  0.008247203289516619,
  0.007816496806034896],
 'acc': [0.9952291666666667,
  0.9967083333333333,
  0.9972291666666667,
  0.9971875,
  0.9975208333333333]}

record.history

>>> {'val_loss': [0.037735904552973806,

0.0383092053757185,

0.03597573842055863,

0.04122187125845812,

0.04817074624549908],

'val_acc': [0.9889166665077209,

0.9899166665077209,

0.9912499998410543,

0.989416666507721,

0.988666666507721],

'loss': [0.01413803466820779,

0.010153079537209123,

0.008668457060974712,

0.008247203289516619,

0.007816496806034896],

'acc': [0.9952291666666667,

0.9967083333333333,

0.9972291666666667,

0.9971875,

0.9975208333333333]}

You can retrieve all the event names using the following command.

record.history.keys()
>>> dict_keys(['val_loss', 'val_acc', 'loss', 'acc'])

1 2	record.history.keys() >>> dict_keys(['val_loss', 'val_acc', 'loss', 'acc'])

You can also get the information about the parameters used while fitting the model. This can be done using the following command.

record.params
>>> {'batch_size': 128,
 'epochs': 5,
 'steps': None,
 'samples': 48000,
 'verbose': 1,
 'do_validation': True,
 'metrics': ['loss', 'acc', 'val_loss', 'val_acc']}

record.params

>>> {'batch_size': 128,

'epochs': 5,

'steps': None,

'samples': 48000,

'verbose': 1,

'do_validation': True,

'metrics': ['loss', 'acc', 'val_loss', 'val_acc']}

Not only this, but one can also check which data is used as the validation data using the following command.

record.validation_data[0]

1	record.validation_data[0]

These are just a few of functionalities available under the History callback. You can check more of these at Keras GitHub.

Plot the training history

Since all the events are stored in a dictionary, one can easily plot these using any plotting library. Here, I’m using Matplotlib. Below is the code for plotting the loss curves for both training and validation sets.

history_dict = record.history
train_acc = history_dict['loss']
val_acc = history_dict['val_loss']
epochs = range(1, len(history_dict['loss'])+1)
plt.plot(epochs, train_acc,'bo',label='Training Accuracy')
plt.plot(epochs, val_acc,'b',label='Validation Accuracy')
plt.title('Training and Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()

history_dict = record.history

train_acc = history_dict['loss']

val_acc = history_dict['val_loss']

epochs = range(1, len(history_dict['loss'])+1)

plt.plot(epochs, train_acc,'bo',label='Training Accuracy')

plt.plot(epochs, val_acc,'b',label='Validation Accuracy')

plt.title('Training and Validation Accuracy')

plt.xlabel('Epochs')

plt.ylabel('Accuracy')

plt.legend()

plt.show()

Similarly, one can plot the accuracy plots. That’s all for History callback. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Keras Callbacks – EarlyStopping

Leave a reply

One common problem that we face while training a neural network is of overfitting. This refers to a situation where the model fails to generalize. In other words, the model performs poorly on the test/validation set as compared to the training set. Take a look at the plot below.

Clearly, after ‘t’ epochs, the model starts overfitting. This is clear by the increasing gap between the train and the validation error in the above plot. Wouldn’t it be nice if we stop the training where the gap starts increasing? This will help prevent the model from overfitting. This method is known as Early Stopping. Some of the pros of using this method are

Prevents the model from overfitting
Parameter-free unlike other regularization techniques like L2 etc.
Removes the need to manually set the number of epochs. Because now the model will automatically stop training when the monitored quantity stops improving.

Fortunately, in Keras, this is done using the EarlyStopping callback. So, let’s first discuss its Keras API and then we will learn how to use this.

Keras API

keras.callbacks.EarlyStopping(monitor='val_loss', min_delta=0, patience=0, verbose=0, mode='auto', baseline=None, restore_best_weights=False)

1	keras.callbacks.EarlyStopping(monitor='val_loss', min_delta=0, patience=0, verbose=0, mode='auto', baseline=None, restore_best_weights=False)

In this, you first need to provide which quantity to monitor using the “monitor” argument. This can take a value from ‘loss’, ‘acc’, ‘val_loss’, ‘val_acc’ or ‘val_metric’ where metric is the name of the metric used. For instance, if the metric is set to ‘mse’ then pass ‘val_mse’.

After setting the monitored quantity, you need to decide whether you want to minimize or maximize it. For instance, we want to minimize loss and maximize accuracy. This can be done using the “mode” argument. This can take value from [‘min‘, ‘max‘, ‘auto‘]. Default is the ‘auto’ mode. In ‘auto’ mode, this automatically infers whether to maximize or minimize depending upon the monitored quantity name.

This stops training whenever the monitored quantity stops improving. By default, any fractional change is considered as an improvement. For instance, if ‘val_acc’ increases from 90% to 90.0001% this is also considered as an improvement. The meaning of improvement may vary from one application to another. So, here we have an argument “min_delta“. Using this we can set the minimum change in the monitored quantity to qualify as an improvement. For instance, if min_delta=1, so all the absolute changes of less than 1, will count as no improvement.

Note: This difference is calculated as the current monitored quantity value minus the best-monitored quantity value until now.

As we already know that neural networks mostly face the problem of plateaus. So monitored quantity may not show improvement for some time and then improve afterward. So, it’s better to wait for a few epochs before making the final decision to stop the training process. This can be done using the “patience” argument. For instance, a patience=3 means if the monitored quantity doesn’t improve for 3 epochs, stop the training process.

The model will stop training some epochs (specified by the “patience” argument) after the best-monitored quantity value. So, the weights you will get are not the best weights. To retrieve the best weights, set the “restore_best_weights” argument to True.

Sometimes for a task, we have a baseline in our mind that at least I should get a minimum of 75% accuracy within 5 epochs. If you are not getting this, there is no point training the model any further. Then you should try changing the hyperparameters and again retrain the model. In this, you can set the baseline using the “baseline” argument. If the monitored quantity minus the min_delta is not surpassing the baseline within the epochs specified by the patience argument, then the training process is stopped.

For instance, below is an example where the baseline is set to 98%.

call = EarlyStopping(monitor='val_acc',verbose=1,min_delta=0.001,patience=3,baseline=0.99)

1	call = EarlyStopping(monitor='val_acc',verbose=1,min_delta=0.001,patience=3,baseline=0.99)

The training process stops because of the val_acc – min_delta < baseline for the patience interval (3 epochs). This is shown below.

After surpassing the baseline, the Early Stopping callback will work as normal i.e. stop training when the monitored quantity stops improving.

Note: If you are not sure about the baseline in your task, just set this argument to None.

I hope you get some feeling about the EarlyStopping callback. Now let’s see how to use this.

How to use this?

Firstly, you need to create an instance of the “EarlyStopping” class as shown below.

from keras.callbacks import EarlyStopping
earlystopping_callback = EarlyStopping(monitor='val_acc',verbose=1,min_delta=0.5,patience=3,baseline=None)

1 2	from keras.callbacks import EarlyStopping earlystopping_callback = EarlyStopping(monitor='val_acc',verbose=1,min_delta=0.5,patience=3,baseline=None)

Then pass this instance in the list while fitting the model.

record = model.fit(train_x, train_y,validation_split=0.20, epochs=10, batch_size=128,callbacks=[earlystopping_callback])

1	record = model.fit(train_x, train_y,validation_split=0.20, epochs=10, batch_size=128,callbacks=[earlystopping_callback])

That’s all for Early Stopping. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.