Category Archives: Image Processing

Blur Detection using the variance of the Laplacian method

In the previous blog, we discussed how to detect low contrast images using the scikit image library. Similar to low contrast images, the blurred images also don’t provide any additional information for our task. So, it’s better to discard these blurred images before doing any task such as in computer vision or any other. Blur detection is an active research topic and several algorithms have been proposed not only for detecting blur but also to deblur the image. So, in this blog, we will discuss one such simple yet effective method for detecting blur. So, let’s get started.

As we all know that the blurry image doesn’t have well-defined edges. So, if you calculate the Laplacian of this image, you will get more or less the same response everywhere. In other words, the variance of this Laplacian image will be less. Now the main question is how much less is less. So you choose a threshold and if the variance is less than this threshold, the image is blurred otherwise not.

So, for a blurred image, the variance of the laplacian will be less as compared to the sharp image. That is why this method is known as the variance of the Laplacian.

Now, the main thing is to set a threshold that decides if an image is blurred or not. Actually, this is a tricky part and this all depends upon your application. So you may need to try out different threshold values and pick out the one that works well for your application. I hope you understood this. Now, let’s see how to implement this using OpenCV-Python.

Steps

Load the image
Convert this to greyscale
Calculate the laplacian of this image and find the variance
If variance < threshold then blurred, otherwise not

import cv2

# Read the image
img = cv2.imread('D:/downloads/child1.jpg')

# Convert to greyscale
grey = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# Find the laplacian of this image and
# calculate the variance
var = cv2.Laplacian(grey, cv2.CV_64F).var()

# if variance is less than the set threshold
# image is blurred otherwise not
if var < 120:
    print('Image is Blurred')
else:
    print('Image Not Blurred')

import cv2

# Read the image

img = cv2.imread('D:/downloads/child1.jpg')

# Convert to greyscale

grey = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# Find the laplacian of this image and

# calculate the variance

var = cv2.Laplacian(grey, cv2.CV_64F).var()

# if variance is less than the set threshold

# image is blurred otherwise not

if var < 120:

print('Image is Blurred')

else:

print('Image Not Blurred')

So this is how this method works. As we already know that the laplacian is very sensitive to noise so this may not give good results. Also setting a good threshold value is also a tricky part. This method is fast and easy to implement but is not guaranteed to work for almost every case. As I already told you that this is an active research area so in the next blog, we will use the Fourier transform and see how it goes. That’s all for this blog. Hope you enjoy reading.

If you have any doubts/suggestions please feel free to ask and I will do my best to help or improve myself. Goodbye until next time.

Finding Corners with SubPixel Accuracy

Leave a reply

In the previous blogs, we discussed how to find the corners using algorithms such as Harris Corner, Shi-Tomasi, etc. If you notice, the detected corners had integer coordinates such as (17,34), etc. This generally works if we were extracting these features for recognition purposes but when it comes to some geometrical measurements we need more precise corner locations such as real-valued coordinates (17.35,34.67). So, in this blog, we will see how to refine the corner locations (detected using Harris or Shi-Tomasi Detector) with sub-pixel accuracy.

OpenCV

OpenCV provides a builtin function cv2.cornerSubPix() that finds the sub-pixel accurate location of the corners. Below is the syntax of this

cv2.cornerSubPix(image, corners, winSize, zeroZone, criteria)

1	cv2.cornerSubPix(image, corners, winSize, zeroZone, criteria)

This function uses the dot product trick and iteratively refines the corner locations till the termination criteria is reaches. Let’s understand this in somewhat more detail.

Consider the image shown below. Suppose, q is the starting corner location and p is the point located within the neighborhood of q.

Clearly, the dot product between the gradient at p and the vector q-p is 0. For instance, for the first case because p₀ lies in a flat region, so the gradient is 0 and hence the dot product. For the second case, the vector q-p₁ lies on the edge and we know that the gradient is perpendicular to the edge so the dot product is 0.

Similarly, we take other points in the neighborhood of q (defined by the winSize parameter) and set the dot product of gradient at that point and the vector to 0 as we did above. Doing so we will get a system of equations. These equations form a linear system that can be solved by the inversion of a single autocorrelation matrix. But this matrix is not always invertible owing to small eigenvalues arising from the pixels very close to q. So, we simply reject the pixels in the immediate neighborhood of q (defined by the zeroZone parameter).

This will give us the new location for q. Now, this will become our starting corner location. Keep iterating until the user-specified termination criterion is reached. I hope you understood this.

Now, let’s take a look at the arguments that this function accepts.

image: Input single-channel, 8-bit grayscale or float image
corners: Array that holds the initial approximate location of corners
winSize: Size of the neighborhood where it searches for corners. This is the Half of the side length of the search window. For example, if winSize=Size(5,5) , then a (5∗2+1)×(5∗2+1)=11×11 search window is used
zeroZone: This is the half of the neighborhood size we want to reject. If you don’t want to reject anything pass (-1.-1)
criteria: Termination criteria. You can either stop it after a specified number of iterations or a certain accuracy is achieved, or whichever occurs first.

For instance, in the above image the red pixel is the initial corner. The winSize is (3,3) and the zeroZone is (1,1). So, only the green pixels have been considered for generating equations while the grey pixels have been rejected.

Now, let’s take the below image and see how to do this using OpenCV-Python

Steps

Load the image and find the corners using Harris Corner Detector as we did in the previous blog. You can use Shi-Tomasi detector also
Now, there may be a bunch of pixels at the corner, so we take their centroids
Then, we define the stopping criteria and refine the corners to subpixel accuracy using the cv2.cornerSubPix()
Finally, we used red color to mark Harris corners and green color to mark refined corners

import numpy as np
import cv2

# Load the image and convert to grayscale
img = cv2.imread('D:/downloads/contracing.png')
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)

# find Harris corners as we did in the previous blog
gray = np.float32(gray)
dst = cv2.cornerHarris(gray,2,3,0.04)
dst = cv2.dilate(dst,None)
ret, dst = cv2.threshold(dst,0.01*dst.max(),255,0)
dst = np.uint8(dst)

# find centroids
ret, labels, stats, centroids = cv2.connectedComponentsWithStats(dst)

# define the criteria to stop. We stop it after a specified number of iterations
# or a certain accuracy is achieved, whichever occurs first.
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 100, 0.001)

# Refine the corners using cv2.cornerSubPix()
corners = cv2.cornerSubPix(gray,np.float32(centroids),(5,5),(-1,-1),criteria)

# To display, first convert the centroids and corners to integer
centroids = np.int0(centroids)
corners = np.int0(corners)

# then i have used red color to mark Harris Corners
# and green color to mark refined corners
img[centroids[:,1], centroids[:,0]]=[0,0,255]
img[corners[:,1], corners[:,0]] = [0,255,0]

# Write the image at the desired location
cv2.imwrite('D:/downloads/subpixel5.png',img)

import numpy as np

import cv2

# Load the image and convert to grayscale

img = cv2.imread('D:/downloads/contracing.png')

gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)

# find Harris corners as we did in the previous blog

gray = np.float32(gray)

dst = cv2.cornerHarris(gray,2,3,0.04)

dst = cv2.dilate(dst,None)

ret, dst = cv2.threshold(dst,0.01*dst.max(),255,0)

dst = np.uint8(dst)

# find centroids

ret, labels, stats, centroids = cv2.connectedComponentsWithStats(dst)

# define the criteria to stop. We stop it after a specified number of iterations

# or a certain accuracy is achieved, whichever occurs first.

criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 100, 0.001)

# Refine the corners using cv2.cornerSubPix()

corners = cv2.cornerSubPix(gray,np.float32(centroids),(5,5),(-1,-1),criteria)

# To display, first convert the centroids and corners to integer

centroids = np.int0(centroids)

corners = np.int0(corners)

# then i have used red color to mark Harris Corners

# and green color to mark refined corners

img[centroids[:,1], centroids[:,0]]=[0,0,255]

img[corners[:,1], corners[:,0]] = [0,255,0]

# Write the image at the desired location

cv2.imwrite('D:/downloads/subpixel5.png',img)

Below are the results of this. For visualization, I have shown the zoomed in version on the right.

Applications

Subpixel corner locations are a common measurement used in camera calibration or when tracking to reconstruct the camera’s path or the three-dimensional structure of a tracked object or used in some algorithms such as SIFT (discussed in the next blog), etc.

That’s all for this blog. Hope you enjoy reading.

If you have any doubts/suggestions please feel free to ask and I will do my best to help or improve myself. Goodbye until next time.

SIFT: Scale-Space Extrema Detection

Leave a reply

In the previous blog, we had an overview of the SIFT algorithm. We discussed different steps involved in this and the invariance that it offers against scale, rotation, illumination, viewpoint, etc. But we didn’t discuss it in detail. So, in this blog, let’s start with the first step which is scale-space extrema detection. So, let’s get started.

Before moving forward, let’s quickly recap what we are doing and why we are doing it and how we are doing it?

We saw how the corner detectors like Harris, Shi-Tomasi, etc suffer when it comes to scaling. (Why we are doing)
So, we want to detect features that are invariant to scale changes and can be robustly detected (What we are doing)
This can be done by searching for stable features (Extremas) across all possible scales, using a continuous function of scale known as scale space. That’s why the name scale-space extrema detection. (How we are doing)

I hope you understood this. Now, let’s understand what is scale-space.

What is a Scale-Space?

As we all know that the real-world objects are composed of different structures at different scales. For instance, the concept of a “tree” is appropriate at the scale of meters, while concepts such as leaves and molecules are more appropriate at finer scales. It would make no sense to analyze leaves or molecules at the scale of the tree (meters). So, this means that you need to analyze everything at an appropriate scale in order to make sense.

But given an image or an unknown scene, there is no apriori way to determine what scales are appropriate for describing the interesting structures in the image data. Hence, the only reasonable approach is to consider descriptions at multiple scales. This representation of images at multiple scales constitutes a so-called scale-space.

How to construct a Scale-Space?

Now, the next thing is how to construct a scale-space? So as we know if we increase the scale, the fine details will be lost and only coarser information prevails. Can you relate this with something you did in image processing? Does Blurring an image with a low pass filter sound similar to this? The answer is yes but there is a catch. We can’t use any low pass filter, only the Gaussian filter helps in mimicking a scale space. This is because the Gaussian filter is shown to be the only filter that obeys the following

Linearity
Shift-invariance
smoothing process does not produce new structures when going from fine to coarser scale
Rotational symmetry and some other properties (You can read about it on Wikipedia)

So to create a scale space, you take the original image and generate progressively blurred-out images using a Gaussian filter. Mathematically, the scale-space representation of an image can be expressed as

L(x,y,σ) is the blurred image or scale space representation of an image
G(x,y,σ) is the Gaussian filter
I(x,y) is the image
σ is the scaling parameter or the amount of blur. As we increase σ, more and more details are removed from the image i.e. more blur

See below where an image is shown at different scales(σ) (source: Wikipedia). See how at larger scales, the fine details got lost.

So, I hope you understood what is a scale-space and how to construct it using Gaussian filter.

Scale-Space in SIFT

In the SIFT paper, the authors modified the scale-space representation. Instead of creating the scale-space representation for the original image only, they created the scale-space representations for different image sizes. This helps in increasing the number of keypoints detected. The idea is shown below

Take the original image, and generate progressively blurred out images. Then, resize the original image to half size. And generate blurred out images again. And keep repeating. This is shown below

Here, we use the term octave to denote the scale-space representation for a particular image size. For instance, all the same size images in vertical line forms one octave. Here, we have 3 octaves and all the octaves contain 4 images at different scales (blurred using Gaussian filter).

Within an octave, the adjacent scales differ by a constant factor k. If an octave contains s+1 images, then k = 2^(1/s). The first image has scale σ₀, the second image has scale kσ₀, the third image has scale k²σ₀, and the last image has scale k^sσ₀. In the paper, they have used the values as number of octaves = 4, number of scale levels = 5, initial σ₀ =1.6, k=√2 etc.

How to decide the number of octaves and number of scales per octave?

The number of octaves and scale depends on the size of the original image. You need to adjust this yourself depending upon the application.

But it has been found out empirically that 3 number of scales sampled per octave provide optimal repeatability under downsampling/upsampling/rotation of the image as well as image noise. Also, Adding more scales per octave will increase the number of detected keypoints, but this does not improve the repeatability (in fact there is a small decrease) – so we settle for the computationally less expensive option. See the below plot

So, once we have constructed the scale-space, the next task is to detect the extrema in this scale-space. That’s why this step is called scale-space extrema detection. To keep this blog short, we will discuss this in the next blog. Hope you enjoy reading.

If you have any doubts/suggestions please feel free to ask and I will do my best to help or improve myself. Goodbye until next time.

Detecting low contrast images using Scikit-image

Leave a reply

In the previous blog, we discussed what is contrast in image processing and how histograms can help us distinguish between low and high contrast images. If you remember, for a high contrast image, the histogram spans the entire dynamic range while for low contrast the histogram covers only a narrow range as shown below

So, just by looking at the histogram of an image, we can tell whether this has low or high contrast.

Problem

But what if you have a large number of images such as in computer vision when training a model. In that case, we generally want to remove these low contrast images as they don’t provide us enough knowledge about the task. But manually examining the histogram of each image will be a tedious and time-consuming task. So, we need to find a way to automate this process.

Solution

Luckily, scikit-image provides a built-in function is_low_contrast() that determines if an image is a low contrast or not. This function returns a boolean where True indicates low contrast. Below is the syntax of this function.

skimage.exposure.is_low_contrast(image, fraction_threshold=0.05, lower_percentile=1, upper_percentile=99, method='linear')

1	skimage.exposure.is_low_contrast(image, fraction_threshold=0.05, lower_percentile=1, upper_percentile=99, method='linear')

Below is the algorithm that this function uses

First, this function converts the image to greyscale
Then this disregards the image intensity values below lower_percentile and above upper_percentile. This is similar to percentile stretching that we did earlier (See here)
Then this calculate the full brightness range for a given image datatype. For instance, for 8-bit, the full brightness range is [0,255]
Finally, this calculates the ratio of image brightness range and full brightness range. If this is less than a set threshold (see fraction_threshold argument above), then the image is considered low contrast. For instance, for a 8-bit image if the image brightness range is [100-150] and the threshold is 0.1 then the ratio will be 50/255 that is 0.19 approx. So, this image is having a high contrast. You need to change this threshold according to your application

I hope you understood this. Now, let’s take an example and see how to implement this.

import cv2
from skimage.exposure import is_low_contrast

# Read the image
img = cv2.imread('D:/downloads/stretch_original.jpg')

# Check if it is low contrast or not
out = is_low_contrast(img, fraction_threshold=0.3)

# if true print low contrast otherwise high contrast
if out:
    print('image has low contrast')
else:
    print('image has high contrast')

import cv2

from skimage.exposure import is_low_contrast

# Read the image

img = cv2.imread('D:/downloads/stretch_original.jpg')

# Check if it is low contrast or not

out = is_low_contrast(img, fraction_threshold=0.3)

# if true print low contrast otherwise high contrast

if out:

print('image has low contrast')

else:

print('image has high contrast')

So, for the below image, this function outputs ‘image has low contrast’ corresponding to the given threshold.

I hope you understood this. Now, in the pre-processing step, you can check whether the image has high or low contrast and then take action accordingly. Hope you enjoy reading.

If you have any doubts/suggestions please feel free to ask and I will do my best to help or improve myself. Goodbye until next time.

Introduction to SIFT (Scale-Invariant Feature Transform)

1 Reply

In the previous blogs, we discussed some corner detectors such as Harris Corner, Shi-Tomasi, etc. If you remember, these corner detectors were rotation invariant, which basically means, even if the image is rotated we would still be able to detect the same corners. This is obvious because corners remain corners in the rotated image also. But when it comes to scaling, these algorithms suffer and don’t give satisfactory results. This is obvious because if we scale the image, a corner may not remain a corner. Let’s understand this with the help of the following image (Source: OpenCV)

See on the left we have a corner in the small green window. But when this corner is zoomed (see on the right), it no longer remains a corner in the same window. So, this is the issue that scaling poses. I hope you understood this.

So, to solve this, in 2004, D.Lowe, University of British Columbia, in his paper, Distinctive Image Features from Scale-Invariant Keypoints came up with a new algorithm, Scale Invariant Feature Transform (SIFT). This algorithm not only detects the features but also describes them. And the best thing about these features is that these features are invariant to changes in

Scale
Rotation
Illumination (partially)
Viewpoint (partially)
Minor image artifacts/ Noise/ Blur

That’s why this was a breakthrough in this field at that time. So, you can use these features to perform different tasks such as object recognition, tracking, image stitching, etc, and don’t need to worry about scale, rotation, etc. Isn’t this cool and that too around 2004!!!

There are mainly four steps involved in SIFT algorithm to generate the set of image features

Scale-space extrema detection: As clear from the name, first we search over all scales and image locations(space) and determine the approximate location and scale of feature points (also known as keypoints). In the next blog, we will discuss how this is done but for now just remember that the first step simply finds the approximate location and scale of the keypoints
Keypoint localization: In this, we take the keypoints detected in the previous step and refine their location and scale to subpixel accuracy. For instance, if the approximate location is 17 then after refinement this may become 17.35 (more precise). Don’t worry we will discuss how this is done in the next blogs. After the refinement step, we discard bad keypoints such as edge points and the low contrast keypoints. So, after this step we get robust set of keypoints.
Orientation assignment: Then we calculate the orientation for each keypoint using its local neighborhood. All future operations are performed on image data that has been transformed relative to the assigned orientation, scale, and location for each feature, thereby providing invariance to these transformations.
Keypoint descriptor: All the previous steps ensured invariance to image location, scale and rotation. Finally we create the descriptor vector for each keypoint such that the descriptor is highly distinctive and partially invariant to the remaining variations such as illumination, 3D viewpoint, etc. This helps in uniquely identify features. Once you have obtained these features along with descriptors we can do whatever we want such as object recognition, tracking, stitching, etc. This sums up the SIFT algorithm on a coarser level.

Because SIFT is an extensive algorithm so we won’t be covering this in a single blog. We will understand each of these 4 steps in separate blogs and finally, we will implement this using OpenCV-Python. And as we will proceed, we will also understand how this algorithm achieves scale, rotation, illumination, and viewpoint invariance as discussed above.

So, in the next blog, let’s start with the scale-space extrema detection and understand this in detail. See you in the next blog. Hope you enjoy reading.

If you have any doubts/suggestions please feel free to ask and I will do my best to help or improve myself. Goodbye until next time.

Shi-Tomasi Corner Detector

Leave a reply

In the previous blog, we discussed the Harris Corner Detector and saw how this uses a score function to evaluate whether a point is a corner or not. But this algorithm doesn’t always yield satisfactory results. So, in 1994, J. Shi and C. Tomasi in their paper Good Features to Track made a small modification to it which shows better results compared to Harris Corner Detector. So, let’s understand how they improved the algorithm.

As you might remember, the scoring function used by Harris Corner Detector is

Instead of this, Shi-Tomasi proposed a new scoring function

So, for a pixel, if this score R is greater than a certain threshold then that pixel is considered as a corner. Similar to Harris Corner Detector if we plot this in λ1−λ2 space, then we will get the below plot

So, as we can see that

only when λ1 and λ2 are above a minimum value, λmin, it is considered as a corner(green region)
when either λ1 or λ2 are below a minimum value, λmin, it is considered as a edge(orange region)
when both λ1 and λ2 are below a minimum value, λmin, it is considered as a flat region(grey region)

So, this is the improvement that Shi-Tomasi did to the Harris Corner Detector. Other than this, the entire algorithm is the same. Now, let’s see how to implement this using OpenCV-Python.

OpenCV

OpenCV provides a built-in function cv2.goodFeaturesToTrack() that finds N strongest corners in the image by either Shi-Tomasi or Harris Corner Detector. Below is the algorithm that this function uses

First, this function calculates the corner quality score at every pixel using either Shi-Tomasi or Harris Corner
Then this function performs a non-maximum suppression (the local maximums in 3 x 3 neighborhood are retained).
After this, all the corners with the quality score less than qualityLevel*max_x,yqualityScore(x,y) are rejected. This max_x,yqualityScore(x,y) is the best corner score. For instance, if the best corner has the quality score = 1500, and the qualityLevel=0.01 , then all the corners with the quality score less than 15 are rejected.
Now, all the remaining corners are sorted by the quality score in the descending order.
Function throws away each corner for which there is a stronger corner at a distance less than maxDistance.

Here is the syntax of this function

cv2.goodFeaturesToTrack(image, maxCorners, qualityLevel, minDistance, [,mask[,blockSize[,useHarrisDetector[,k]]]])

# image - Input 8-bit or floating-point 32-bit, single-channel image
# maxCorners - Maximum number of corners to return. If there are more corners than are found, the strongest of them is returned. if <= 0 implies that no limit on the maximum is set and all detected corners are returned
# qualityLevel - Parameter characterizing the minimal accepted quality of image corners. See the above paragraph for explanation
# minDistance - Minimum possible Euclidean distance between the returned corners
# mask - Optional region of interest. If the image is not empty it specifies the region in which the corners are detected
# blockSize - Size of an average block for computing a derivative covariation matrix over each pixel neighborhood
# useHarrisDetector - whether to use Shi-Tomasi or Harris Corner
# k - Free parameter of the Harris detector

cv2.goodFeaturesToTrack(image, maxCorners, qualityLevel, minDistance, [,mask[,blockSize[,useHarrisDetector[,k]]]])

# image - Input 8-bit or floating-point 32-bit, single-channel image

# maxCorners - Maximum number of corners to return. If there are more corners than are found, the strongest of them is returned. if <= 0 implies that no limit on the maximum is set and all detected corners are returned

# qualityLevel - Parameter characterizing the minimal accepted quality of image corners. See the above paragraph for explanation

# minDistance - Minimum possible Euclidean distance between the returned corners

# mask - Optional region of interest. If the image is not empty it specifies the region in which the corners are detected

# blockSize - Size of an average block for computing a derivative covariation matrix over each pixel neighborhood

# useHarrisDetector - whether to use Shi-Tomasi or Harris Corner

# k - Free parameter of the Harris detector

Now, let’s take the image we used in the previous blog and detect the top 20 corners. Below is the code for this

import numpy as np
import cv2

# Read the image and convert to greyscale
img = cv2.imread('D:/downloads/contracing.png')
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)

# Find the top 20 corners using the cv2.goodFeaturesToTrack()
corners = cv2.goodFeaturesToTrack(gray,20,0.01,10)
corners = np.int0(corners)

# Iterate over the corners and draw a circle at that location
for i in corners:
    x,y = i.ravel()
    cv2.circle(img,(x,y),5,(0,0,255),-1)
    
# Display the image
cv2.imshow('a', img)
cv2.waitKey(0)

import numpy as np

import cv2

# Read the image and convert to greyscale

img = cv2.imread('D:/downloads/contracing.png')

gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)

# Find the top 20 corners using the cv2.goodFeaturesToTrack()

corners = cv2.goodFeaturesToTrack(gray,20,0.01,10)

corners = np.int0(corners)

# Iterate over the corners and draw a circle at that location

for i in corners:

x,y = i.ravel()

cv2.circle(img,(x,y),5,(0,0,255),-1)

# Display the image

cv2.imshow('a', img)

cv2.waitKey(0)

Below is the result of this

You can also use the Harris Corner Detector method by specifying the flag useHarrisDetector and the k parameter in the above function as shown

corners = cv2.goodFeaturesToTrack(gray,20,0.01,10, useHarrisDetector=True, k=0.04)

1	corners = cv2.goodFeaturesToTrack(gray,20,0.01,10, useHarrisDetector=True, k=0.04)

So, that’s all about Shi-Tomasi Detector.

Limitations

Both Shi-Tomasi and Harris Corner work well for most of the cases but when the scale of the image changes both of these algorithms doesn’t give satisfactory results. So, in the next blog, we will discuss one of the famous algorithms for finding scale-invariant features known as SIFT (Scale-Invariant Feature Transform). This algorithm was a breakthrough in this field. See you in the next blog. Hope you enjoy reading.

If you have any doubts/suggestions please feel free to ask and I will do my best to help or improve myself. Goodbye until next time.

Harris Corner Detection

Leave a reply

In the previous blog, we discussed what are features and how corners are considered as a good feature as compared to edges and flat surfaces. In this blog, let’s discuss one of the famous and most commonly used corner detection methods known as Harris Corner Detection. This was one of the early attempts to find the corners by Chris Harris & Mike Stephens in their paper A Combined Corner and Edge Detector in 1988. Now it is called the Harris Corner Detector. So, let’s first understand the basic idea behind this algorithm, and then we will dive into mathematics. Let’s get started.

As discussed in the previous blog, corners are regions in the image with large variations in intensity in all directions. For instance, take a look at the below image. If you shift the window by a small amount, then corners will produce a significant change in all directions while edges will output no change if we move the window along the edge direction. And the flat region will output no change in all directions on window movement.

So, the authors took this simple idea of finding the difference in intensity for a displacement of (u,v) in all directions into a mathematical form. This is expressed as

Here,

the window function is either a rectangular window or a Gaussian window which gives weights to pixels underneath.
E(u,v) is the difference in intensities between the original and the moved window.

As can be clearly seen, for nearly constant patches the error function will be close to 0 while for distinctive patches this will be larger. Hence, our aim is to find patches where this error function is large. In other words, we need to maximize this error function for corner detection. That means we have to maximize the second term. We can do this by applying Taylor Expansion and using some mathematical steps as shown below

So, the final equation becomes

Then comes the main part. As we have already discussed that corners are the regions in the image with large variations in intensity in all directions. Or we can say it in terms of the above matrix M as “A corner is characterized by a large variation of M in all directions of the vector [u,v]”. So, if you remember that eigenvalues tell us about the variance thus by simply analyzing the eigenvalues of the matrix M we can infer the results.

But the authors note that the exact computation of the eigenvalues is computationally expensive, since it requires the computation of a square root, and instead suggests the following score function which determines if a window contains a corner or not. This is shown below

Therefore, the algorithm does not have to actually compute the eigenvalue decomposition of the matrix M and instead it is sufficient to evaluate the determinant and trace of matrix M to find the corners.

Now, depending upon the magnitude of the eigenvalues and the score (R), we can decide whether a region is a corner, an edge, or flat.

When |R| is small, which happens when λ1 and λ2 are small, the region is flat.
When R<0, which happens when λ1>>λ2 or vice versa, the region is edge.
- If λ1>>λ2, then vertical edge
- otherwise horizontal edge
When R is large, which happens when λ1 and λ2 are large and λ1∼λ2, the region is a corner

This can also be represented by the below image

So, this algorithm will give us a score corresponding to each pixel. Then we need to do thresholding in order to find the corners.

Because we consider only the eigenvalues of the matrix (M), we are considering quantities that are invariant also to rotation, which is important because objects that we are tracking might rotate as well as move. So, this makes this algorithm rotation invariant.

So, this concludes the Harris Corner Detector. I hope you understood this. Now, let’s see how to do this using OpenCV-Python.

OpenCV

OpenCV provides a builtin function cv2.cornerHarris() that runs the Harris corner detector on the image. Below is the syntax for this.

cv2.cornerHarris(src, blockSize, ksize, k)

# src - Input single-channel 8-bit or floating-point image.
# blockSize - Neighborhood size used when computing the matrix M.
# ksize - Aperture parameter for the Sobel operator.
# k - Harris detector free parameter in the score equation.

cv2.cornerHarris(src, blockSize, ksize, k)

# src - Input single-channel 8-bit or floating-point image.

# blockSize - Neighborhood size used when computing the matrix M.

# ksize - Aperture parameter for the Sobel operator.

# k - Harris detector free parameter in the score equation.

For each pixel (x,y) it calculates a 2×2 gradient covariance matrix M(x,y) over a blockSize×blockSize neighborhood. Then using this matrix M, this calculates the score for each pixel. Below is the code for this

import cv2

# Read the image and convert to greyscale
img = cv2.imread('D:/downloads/contracing.png')
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)

# Find the corners using the Harris Corner Detector
dst = cv2.cornerHarris(gray,2,3,0.04)

#result is dilated for enhancing the corners, not important
dst = cv2.dilate(dst,None)

# Threshold for an optimal value, it may vary depending on the image.
thresh = 0.01*dst.max()
img[dst>thresh]=[0,255,0]

# Display the image
cv2.imshow('dst',img)
cv2.waitKey(0)

import cv2

# Read the image and convert to greyscale

img = cv2.imread('D:/downloads/contracing.png')

gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)

# Find the corners using the Harris Corner Detector

dst = cv2.cornerHarris(gray,2,3,0.04)

#result is dilated for enhancing the corners, not important

dst = cv2.dilate(dst,None)

# Threshold for an optimal value, it may vary depending on the image.

thresh = 0.01*dst.max()

img[dst>thresh]=[0,255,0]

# Display the image

cv2.imshow('dst',img)

cv2.waitKey(0)

Below is the result of this.

So, this is how you can implement the Harris Corner Detector using OpenCV-Python. I hope you understood this. In the next blog, we will discuss the Shi-Tomasi algorithm that improves this Harris Corner Detector even further. Hope you enjoy reading.

If you have any doubts/suggestions please feel free to ask and I will do my best to help or improve myself. Goodbye until next time.

Feature Detection, Description, and Matching

Leave a reply

In the previous blogs, we discussed different segmentation algorithms such as watershed, grabcut, etc. From this blog, we will start another interesting topic known as Feature Detection, Description, and Matching. This has many applications in the field of computer vision such as image-stitching, object tracking, serving as the first step for many computer vision applications, etc. Over the past few decades, a number of algorithms has been proposed but before diving into these algorithms let’s first understand what in general are the features, and why are important. So, let’s get started.

What is a Feature?

According to Wikipedia, a feature is any piece of information that is relevant for solving any task. For instance, let’s say we have the task of identifying an apple in the image. So, the features useful in this case can be shape, color, texture, etc.

Now, that you know what features are, let’s try to understand which features are more important than others. For this, let’s take the example of image matching. Suppose you are given two images (see below) and your task is to match the rectangle present in the first image with the other. And, let’s say you are given 3 feature points A- flat area, B- edge, and C- corner. So now the question is, which of these is a better feature for matching the rectangle.

Clearly, A is a flat area. So, it’s difficult to find the exact location of this point in the other image. Thus, this is not a good feature point for matching. For B (edge), we can find the approximate location but not the accurate location. So, an edge is, therefore, a better feature compared to the flat area, but not good enough. But we can easily and accurately locate C (corner) in the other image and is thus is considered a good feature. So, corners are considered to be good features in an image. These feature points are also known as interest points.

What is a good feature or interest point?

A good feature or interest point is one that is robust to changes in illumination or brightness, scale and can be reliably computed with a high degree of repeatability. And also gives us enough knowledge about the task (see corner feature points for matching above). Also, a good feature should be unique, distinctive, and global.

So, I hope now you have some idea about the features. Now, let’s take a look at some of the applications of Feature Detection, Description, and Matching.

Applications

Object tracking
Image matching
Object Recognition
3D object reconstruction
image stitching
Motion-based segmentation

All these applications follow the same general steps i.e. Feature Detection, Feature Description, and Feature Matching. All these steps are discussed below.

Steps

First, we detect all the feature points. This is known as Feature Detection. There are several algorithms developed for this such as

Harris Corner
SIFT(Scale Invariant Feature Transform)
SURF(Speeded Up Robust Feature)
FAST(Features from Accelerated Segment Test)
ORB(Oriented FAST and Rotated BRIEF)

We will discuss each of these algorithms in detail in the next blogs.

Then we describe each of these feature points. This is known as Feature Description. Suppose we have 2 images as shown below. Both of these contain corners. So, the question is are they the same or different.

Obviously, both are different as the first one contains a green area to the lower right while the other one has a green area to the upper right. So, basically what you did is you described both these features and that has led us to answer the question. Similarly, a computer also should describe the region around the feature so that it can find it in other images. So, this is the feature description. There are also several algorithms for this such as

SIFT(Scale Invariant Feature Transform)
SURF(Speeded Up Robust Feature)
BRISK (Binary Robust Invariant Scalable Keypoints)
BRIEF (Binary Robust Independent Elementary Features)
ORB(Oriented FAST and Rotated BRIEF)

As you might have noticed, that some of the above algorithms were also there in feature detection. These algorithms perform both feature detection and description. We will discuss each of these algorithms in detail in the next blogs.

Once we have the features and their descriptors, the next task is to match these features in the different images. This is known as Feature Matching. Below are some of the algorithms for this

Brute-Force Matcher
FLANN(Fast Library for Approximate Nearest Neighbors) Matcher

We will discuss each of these algorithms in detail in the next blogs. Hope you enjoy reading.

If you have any doubts/suggestions please feel free to ask and I will do my best to help or improve myself. Goodbye until next time.

Creating gif from video using OpenCV and imageio

Leave a reply

In this blog, we will learn how to create gif from videos using OpenCV and imageio library. To install imageio library, simply do pip install imageio. So, let’s get started.

Steps

Open the video file using cv2.VideoCapture()
Read the frames one by one using the cap.read() method
Convert each frame to RGB. This is required because imageio accepts images in RGB format.
Save the frames in a list and close the video file
Convert the frames list to gif using the imageio.mimsave() method. Set the frame per second (fps) according to your application.

Below is the code for this

import cv2

import imageio

cap = cv2.VideoCapture('D:/downloads/video.mp4')
image_lst = []

while True:
    ret, frame = cap.read()
    frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    image_lst.append(frame_rgb)
    
    cv2.imshow('a', frame)
    key = cv2.waitKey(1)
    if key == ord('q'):
        break
        
cap.release()
cv2.destroyAllWindows()

# Convert to gif using the imageio.mimsave method
imageio.mimsave('D:/downloads/video.gif', image_lst, fps=60)

import cv2

import imageio

cap = cv2.VideoCapture('D:/downloads/video.mp4')

image_lst = []

while True:

ret, frame = cap.read()

frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)

image_lst.append(frame_rgb)

cv2.imshow('a', frame)

key = cv2.waitKey(1)

if key == ord('q'):

break

cap.release()

cv2.destroyAllWindows()

# Convert to gif using the imageio.mimsave method

imageio.mimsave('D:/downloads/video.gif', image_lst, fps=60)

This is how you convert video to gif. Now, let’s see how to convert a specific part of a video to gif.

Converting a specific part of a video to gif

There might be a case where instead of converting the entire video to a gif, you only want to convert a specific part of the video to a gif. There are several ways you can do this.

Approach 1

Using the fps of the video, we can easily calculate the starting and ending frame number and then extract all the frames lying between these two. Once the specific frames are extracted, we can easily convert them to gifs using imageio as discussed above. Below is the code for this where the frames are extracted from 20 seconds to 25 seconds.

import cv2

import imageio

cap = cv2.VideoCapture('D:/downloads/video.mp4')
fps = cap.get(cv2.CAP_PROP_FPS)
start_time = 20*fps
end_time = 25*fps
image_lst = []
i = 0

while True:
    ret, frame = cap.read()
    if ret == False:
        break
    if (i>=start_time and i<=end_time):
        frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        image_lst.append(frame_rgb)

        cv2.imshow('a', frame)
        key = cv2.waitKey(1)
        if key == ord('q'):
            break
    i +=1

cap.release()
cv2.destroyAllWindows()

# Convert to gif using the imageio.mimsave method
imageio.mimsave('D:/downloads/video.gif', image_lst, fps=60)

import cv2

import imageio

cap = cv2.VideoCapture('D:/downloads/video.mp4')

fps = cap.get(cv2.CAP_PROP_FPS)

start_time = 20*fps

end_time = 25*fps

image_lst = []

i = 0

while True:

ret, frame = cap.read()

if ret == False:

break

if (i>=start_time and i<=end_time):

frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)

image_lst.append(frame_rgb)

cv2.imshow('a', frame)

key = cv2.waitKey(1)

if key == ord('q'):

break

i +=1

cap.release()

cv2.destroyAllWindows()

# Convert to gif using the imageio.mimsave method

imageio.mimsave('D:/downloads/video.gif', image_lst, fps=60)

Approach 2

You can also save the frames manually by pressing some keys. For instance, you can start saving frames when key ‘s’ is pressed and stop saving when key ‘q’ is pressed. Once the specific frames are extracted, we can easily convert them to gifs using imageio as discussed above. Below is the code for this.

import cv2

import imageio

cap = cv2.VideoCapture('D:/downloads/video.mp4')
image_lst = []

prev_key = -1
while True:
    ret, frame = cap.read()
    cv2.imshow('a', frame)
    key = cv2.waitKey(1)
    
    if key == ord('s'):
        key = -1
        prev_key = ord('s')
    
    if key == -1 and prev_key == ord('s'):
        frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        image_lst.append(frame_rgb)
    
    if key == ord('q'):
        break
        
cap.release()
cv2.destroyAllWindows()

# Convert to gif using the imageio.mimsave method
imageio.mimsave('D:/downloads/video.gif', image_lst, fps=60)

import cv2

import imageio

cap = cv2.VideoCapture('D:/downloads/video.mp4')

image_lst = []

prev_key = -1

while True:

ret, frame = cap.read()

cv2.imshow('a', frame)

key = cv2.waitKey(1)

if key == ord('s'):

key = -1

prev_key = ord('s')

if key == -1 and prev_key == ord('s'):

frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)

image_lst.append(frame_rgb)

if key == ord('q'):

break

cap.release()

cv2.destroyAllWindows()

# Convert to gif using the imageio.mimsave method

imageio.mimsave('D:/downloads/video.gif', image_lst, fps=60)

Approach 3

This approach is comparatively more tedious. In this, you go over each frame one by one and if you want to include that frame in gif you press the key ‘a’. To exit, you press the key ‘q’. Once the specific frames are extracted, we can easily convert them to gifs using imageio as discussed above. Below is the code for this.

import cv2
import imageio

cap = cv2.VideoCapture('D:/downloads/video.mp4')
image_lst = []

while True:
    ret, frame = cap.read()
    cv2.imshow('a', frame)
    key = cv2.waitKey(0)

    
    if key == ord('s'):
        frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        image_lst.append(frame_rgb)
    
    if key == ord('q'):
        break
        
cap.release()
cv2.destroyAllWindows()

# Convert to gif using the imageio.mimsave method
imageio.mimsave('D:/downloads/video.gif', image_lst, fps=60)

import cv2

import imageio

cap = cv2.VideoCapture('D:/downloads/video.mp4')

image_lst = []

while True:

ret, frame = cap.read()

cv2.imshow('a', frame)

key = cv2.waitKey(0)

if key == ord('s'):

frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)

image_lst.append(frame_rgb)

if key == ord('q'):

break

cap.release()

cv2.destroyAllWindows()

# Convert to gif using the imageio.mimsave method

imageio.mimsave('D:/downloads/video.gif', image_lst, fps=60)

This is how you can convert specific part of a video to gif. Hope you enjoy reading.

If you have any doubts/suggestions, please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Template matching using OpenCV

2 Replies

In the previous blogs, we discussed different segmentation algorithms. Now, let’s explore another important computer vision area known as object detection. This simply means identifying and locating objects, that is, where is this object present in the image. There are various algorithms available for object detection. In this blog, let’s discuss one such algorithm known as Template matching. So, let’s get started.

In the template matching, we have a template image and we need to find where is this in the input image. For this, the sliding window approach is used. In the sliding window approach, we simply slide the template image over the input image (similar to convolution) and compare the overlapping patch. For comparison, you can use any method such as cross-correlation, squared difference, etc. Below is the list of the comparison methods provided by OpenCV.

Here, I(x,y) denotes the input image, T(x,y) template image, R(x,y) result, and (w,h) as width and height of the template image.

This outputs a grayscale image, where each pixel represents how much does the neighborhood of that pixel match with the template (i.e. the comparison score). From this, we can either select the maximum/minimum (depending on the comparison method used) or use thresholding to select the probable region of interest. This is how the template matching works. Now, let’s see how to do this using OpenCV-Python.

OpenCV

OpenCV provides a built-in function cv2.matchTemplate() that implements the template matching algorithm. This takes as input the image, template and the comparison method and outputs the comparison result. The syntax is given below.

result = cv2.matchTemplate(image, template, method[, mask]])

# image: must be 8-bit or 32-bit floating-point
# template: should have size less than input image and same data type
# method: Comparison method to be used. See the table above
# mask: Optional mask. must have same size as template. If data type is CV_32F, then values treated as weights otherwise treat it as the traditional mask.
# result: comparison result. single-channel 32-bit floating-point. If image is W×H and templ is w×h , then result is (W−w+1)×(H−h+1)

result = cv2.matchTemplate(image, template, method[, mask]])

# image: must be 8-bit or 32-bit floating-point

# template: should have size less than input image and same data type

# method: Comparison method to be used. See the table above

# mask: Optional mask. must have same size as template. If data type is CV_32F, then values treated as weights otherwise treat it as the traditional mask.

# result: comparison result. single-channel 32-bit floating-point. If image is W×H and templ is w×h , then result is (W−w+1)×(H−h+1)

Let’s take the below input and template image and see how this works.

Read both the input and the template image
Apply the template matching using cv2.matchTemplate()
If the method is cv2.TM_SQDIFF or cv2.TM_SQDIFF_NORMED, take the minimum, otherwise, take the maximum. This can be done using the cv2.minMaxLoc() function which finds the minimum and maximum element values and their positions.
Once the min/max position is found, we can easily draw the rectangle by taking this position as the top-left corner and using (w,h) of the template.

import cv2

# read input and template image
img = cv2.imread('D:/downloads/messi.jpg')
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

template = cv2.imread('D:/downloads/messi_temp10.jpg',0)
w, h = template.shape[::-1]

# Apply template Matching
res = cv2.matchTemplate(img_gray,template,cv2.TM_SQDIFF)

# Find min/max value and location
min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(res)

# If the method is TM_SQDIFF or TM_SQDIFF_NORMED, 
# take minimum otherwise maximum
bottom_right = (min_loc[0] + w, min_loc[1] + h)

# draw the rectangle
cv2.rectangle(img,min_loc, bottom_right, (0,0,255), 2)

# Display the image
cv2.imshow('a', img)
cv2.waitKey(0)

import cv2

# read input and template image

img = cv2.imread('D:/downloads/messi.jpg')

img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

template = cv2.imread('D:/downloads/messi_temp10.jpg',0)

w, h = template.shape[::-1]

# Apply template Matching

res = cv2.matchTemplate(img_gray,template,cv2.TM_SQDIFF)

# Find min/max value and location

min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(res)

# If the method is TM_SQDIFF or TM_SQDIFF_NORMED,

# take minimum otherwise maximum

bottom_right = (min_loc[0] + w, min_loc[1] + h)

# draw the rectangle

cv2.rectangle(img,min_loc, bottom_right, (0,0,255), 2)

# Display the image

cv2.imshow('a', img)

cv2.waitKey(0)

Template Matching with Multiple Objects

But what if we have multiple instances of the object present in an image. Clearly, the above approach will not work as this only finds a single instance of an object because we are finding max/min location. One plausible approach would be to use thresholding. Instead of taking out the max/min, we take out all the values greater/less than a certain threshold limit. This will not always work perfectly but still in some cases this approach can provide reasonable results.

Let’s take the below image and template to understand the multiple objects case.

import cv2
import numpy as np

# Read the input and template image
img = cv2.imread('D:/downloads/hearts_8.png')
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
template = cv2.imread('D:/downloads/hearts_8_temp.png',0)
w, h = template.shape[::-1]

# Apply template matching
res = cv2.matchTemplate(img_gray,template,cv2.TM_CCOEFF_NORMED)

# Threshold the result
threshold = 0.95
loc = np.where(res >= threshold)

# Draw the rectangle
for pt in zip(*loc[::-1]):
    cv2.rectangle(img, pt, (pt[0] + w, pt[1] + h), (255,0,0), 1)

# Display the result
cv2.imshow('a',img)
cv2.waitKey(0)

import cv2

import numpy as np

# Read the input and template image

img = cv2.imread('D:/downloads/hearts_8.png')

img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

template = cv2.imread('D:/downloads/hearts_8_temp.png',0)

w, h = template.shape[::-1]

# Apply template matching

res = cv2.matchTemplate(img_gray,template,cv2.TM_CCOEFF_NORMED)

# Threshold the result

threshold = 0.95

loc = np.where(res >= threshold)

# Draw the rectangle

for pt in zip(*loc[::-1]):

cv2.rectangle(img, pt, (pt[0] + w, pt[1] + h), (255,0,0), 1)

# Display the result

cv2.imshow('a',img)

cv2.waitKey(0)

Below is the resultant image.

Problems!!!

From the above result, you might have guessed the problems with this approach. Clearly, template matching is translation invariant. But as you can see from the above image, the hearts which are rotated and which have smaller size are not detected. Thus, this algorithm is not rotation and scale-invariant. Other potential problems include occlusion, illumination, and background changes, etc.

In the next blog, let’s improve this algorithm further and make it more robust against scale. For this, we will use the concept of image pyramids. See you in the next blog. Hope you enjoy reading.

If you have any doubts/suggestions please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

TheAILearner

Mastering Artificial Intelligence

Category Archives: Image Processing

Blur Detection using the variance of the Laplacian method

Steps

Finding Corners with SubPixel Accuracy

OpenCV

Steps

Applications

SIFT: Scale-Space Extrema Detection

What is a Scale-Space?

How to construct a Scale-Space?

Scale-Space in SIFT

How to decide the number of octaves and number of scales per octave?

Detecting low contrast images using Scikit-image

Problem

Solution

Introduction to SIFT (Scale-Invariant Feature Transform)

Shi-Tomasi Corner Detector

OpenCV

Limitations

Harris Corner Detection

OpenCV

Feature Detection, Description, and Matching

What is a Feature?

What is a good feature or interest point?

Applications

Steps

Creating gif from video using OpenCV and imageio

Steps

Converting a specific part of a video to gif

Approach 1

Approach 2

Approach 3

Template matching using OpenCV

OpenCV

Template Matching with Multiple Objects

Problems!!!