Tag Archives: feature detection opencv

SIFT: Scale-Space Extrema Detection

What is a Scale-Space?

As we all know that the real-world objects are composed of different structures at different scales. For instance, the concept of a “tree” is appropriate at the scale of meters, while concepts such as leaves and molecules are more appropriate at finer scales. It would make no sense to analyze leaves or molecules at the scale of the tree (meters). So, this means that you need to analyze everything at an appropriate scale in order to make sense.

But given an image or an unknown scene, there is no apriori way to determine what scales are appropriate for describing the interesting structures in the image data. Hence, the only reasonable approach is to consider descriptions at multiple scales. This representation of images at multiple scales constitutes a so-called scale-space.

How to construct a Scale-Space?

Now, the next thing is how to construct a scale-space? So as we know if we increase the scale, the fine details will be lost and only coarser information prevails. Can you relate this with something you did in image processing? Does Blurring an image with a low pass filter sound similar to this? The answer is yes but there is a catch. We can’t use any low pass filter, only the Gaussian filter helps in mimicking a scale space. This is because the Gaussian filter is shown to be the only filter that obeys the following

Linearity
Shift-invariance
smoothing process does not produce new structures when going from fine to coarser scale
Rotational symmetry and some other properties (You can read about it on Wikipedia)

So to create a scale space, you take the original image and generate progressively blurred-out images using a Gaussian filter. Mathematically, the scale-space representation of an image can be expressed as

L(x,y,σ) is the blurred image or scale space representation of an image
G(x,y,σ) is the Gaussian filter
I(x,y) is the image
σ is the scaling parameter or the amount of blur. As we increase σ, more and more details are removed from the image i.e. more blur

See below where an image is shown at different scales(σ) (source: Wikipedia). See how at larger scales, the fine details got lost.

So, I hope you understood what is a scale-space and how to construct it using Gaussian filter.

Scale-Space in SIFT

In the SIFT paper, the authors modified the scale-space representation. Instead of creating the scale-space representation for the original image only, they created the scale-space representations for different image sizes. This helps in increasing the number of keypoints detected. The idea is shown below

Take the original image, and generate progressively blurred out images. Then, resize the original image to half size. And generate blurred out images again. And keep repeating. This is shown below

Here, we use the term octave to denote the scale-space representation for a particular image size. For instance, all the same size images in vertical line forms one octave. Here, we have 3 octaves and all the octaves contain 4 images at different scales (blurred using Gaussian filter).

Within an octave, the adjacent scales differ by a constant factor k. If an octave contains s+1 images, then k = 2^(1/s). The first image has scale σ₀, the second image has scale kσ₀, the third image has scale k²σ₀, and the last image has scale k^sσ₀. In the paper, they have used the values as number of octaves = 4, number of scale levels = 5, initial σ₀ =1.6, k=√2 etc.

How to decide the number of octaves and number of scales per octave?

The number of octaves and scale depends on the size of the original image. You need to adjust this yourself depending upon the application.

But it has been found out empirically that 3 number of scales sampled per octave provide optimal repeatability under downsampling/upsampling/rotation of the image as well as image noise. Also, Adding more scales per octave will increase the number of detected keypoints, but this does not improve the repeatability (in fact there is a small decrease) – so we settle for the computationally less expensive option. See the below plot

So, once we have constructed the scale-space, the next task is to detect the extrema in this scale-space. That’s why this step is called scale-space extrema detection. To keep this blog short, we will discuss this in the next blog. Hope you enjoy reading.

If you have any doubts/suggestions please feel free to ask and I will do my best to help or improve myself. Goodbye until next time.

Shi-Tomasi Corner Detector

OpenCV

OpenCV provides a built-in function cv2.goodFeaturesToTrack() that finds N strongest corners in the image by either Shi-Tomasi or Harris Corner Detector. Below is the algorithm that this function uses

First, this function calculates the corner quality score at every pixel using either Shi-Tomasi or Harris Corner
Then this function performs a non-maximum suppression (the local maximums in 3 x 3 neighborhood are retained).
After this, all the corners with the quality score less than qualityLevel*max_x,yqualityScore(x,y) are rejected. This max_x,yqualityScore(x,y) is the best corner score. For instance, if the best corner has the quality score = 1500, and the qualityLevel=0.01 , then all the corners with the quality score less than 15 are rejected.
Now, all the remaining corners are sorted by the quality score in the descending order.
Function throws away each corner for which there is a stronger corner at a distance less than maxDistance.

Here is the syntax of this function

cv2.goodFeaturesToTrack(image, maxCorners, qualityLevel, minDistance, [,mask[,blockSize[,useHarrisDetector[,k]]]])

# image - Input 8-bit or floating-point 32-bit, single-channel image
# maxCorners - Maximum number of corners to return. If there are more corners than are found, the strongest of them is returned. if <= 0 implies that no limit on the maximum is set and all detected corners are returned
# qualityLevel - Parameter characterizing the minimal accepted quality of image corners. See the above paragraph for explanation
# minDistance - Minimum possible Euclidean distance between the returned corners
# mask - Optional region of interest. If the image is not empty it specifies the region in which the corners are detected
# blockSize - Size of an average block for computing a derivative covariation matrix over each pixel neighborhood
# useHarrisDetector - whether to use Shi-Tomasi or Harris Corner
# k - Free parameter of the Harris detector

cv2.goodFeaturesToTrack(image, maxCorners, qualityLevel, minDistance, [,mask[,blockSize[,useHarrisDetector[,k]]]])

# image - Input 8-bit or floating-point 32-bit, single-channel image

# maxCorners - Maximum number of corners to return. If there are more corners than are found, the strongest of them is returned. if <= 0 implies that no limit on the maximum is set and all detected corners are returned

# qualityLevel - Parameter characterizing the minimal accepted quality of image corners. See the above paragraph for explanation

# minDistance - Minimum possible Euclidean distance between the returned corners

# mask - Optional region of interest. If the image is not empty it specifies the region in which the corners are detected

# blockSize - Size of an average block for computing a derivative covariation matrix over each pixel neighborhood

# useHarrisDetector - whether to use Shi-Tomasi or Harris Corner

# k - Free parameter of the Harris detector

Now, let’s take the image we used in the previous blog and detect the top 20 corners. Below is the code for this

import numpy as np
import cv2

# Read the image and convert to greyscale
img = cv2.imread('D:/downloads/contracing.png')
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)

# Find the top 20 corners using the cv2.goodFeaturesToTrack()
corners = cv2.goodFeaturesToTrack(gray,20,0.01,10)
corners = np.int0(corners)

# Iterate over the corners and draw a circle at that location
for i in corners:
    x,y = i.ravel()
    cv2.circle(img,(x,y),5,(0,0,255),-1)
    
# Display the image
cv2.imshow('a', img)
cv2.waitKey(0)

import numpy as np

import cv2

# Read the image and convert to greyscale

img = cv2.imread('D:/downloads/contracing.png')

gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)

# Find the top 20 corners using the cv2.goodFeaturesToTrack()

corners = cv2.goodFeaturesToTrack(gray,20,0.01,10)

corners = np.int0(corners)

# Iterate over the corners and draw a circle at that location

for i in corners:

x,y = i.ravel()

cv2.circle(img,(x,y),5,(0,0,255),-1)

# Display the image

cv2.imshow('a', img)

cv2.waitKey(0)

Below is the result of this

You can also use the Harris Corner Detector method by specifying the flag useHarrisDetector and the k parameter in the above function as shown

corners = cv2.goodFeaturesToTrack(gray,20,0.01,10, useHarrisDetector=True, k=0.04)

1	corners = cv2.goodFeaturesToTrack(gray,20,0.01,10, useHarrisDetector=True, k=0.04)

So, that’s all about Shi-Tomasi Detector.

Limitations

Both Shi-Tomasi and Harris Corner work well for most of the cases but when the scale of the image changes both of these algorithms doesn’t give satisfactory results. So, in the next blog, we will discuss one of the famous algorithms for finding scale-invariant features known as SIFT (Scale-Invariant Feature Transform). This algorithm was a breakthrough in this field. See you in the next blog. Hope you enjoy reading.

If you have any doubts/suggestions please feel free to ask and I will do my best to help or improve myself. Goodbye until next time.

Harris Corner Detection

Leave a reply

In the previous blog, we discussed what are features and how corners are considered as a good feature as compared to edges and flat surfaces. In this blog, let’s discuss one of the famous and most commonly used corner detection methods known as Harris Corner Detection. This was one of the early attempts to find the corners by Chris Harris & Mike Stephens in their paper A Combined Corner and Edge Detector in 1988. Now it is called the Harris Corner Detector. So, let’s first understand the basic idea behind this algorithm, and then we will dive into mathematics. Let’s get started.

As discussed in the previous blog, corners are regions in the image with large variations in intensity in all directions. For instance, take a look at the below image. If you shift the window by a small amount, then corners will produce a significant change in all directions while edges will output no change if we move the window along the edge direction. And the flat region will output no change in all directions on window movement.

So, the authors took this simple idea of finding the difference in intensity for a displacement of (u,v) in all directions into a mathematical form. This is expressed as

Here,

the window function is either a rectangular window or a Gaussian window which gives weights to pixels underneath.
E(u,v) is the difference in intensities between the original and the moved window.

As can be clearly seen, for nearly constant patches the error function will be close to 0 while for distinctive patches this will be larger. Hence, our aim is to find patches where this error function is large. In other words, we need to maximize this error function for corner detection. That means we have to maximize the second term. We can do this by applying Taylor Expansion and using some mathematical steps as shown below

So, the final equation becomes

Then comes the main part. As we have already discussed that corners are the regions in the image with large variations in intensity in all directions. Or we can say it in terms of the above matrix M as “A corner is characterized by a large variation of M in all directions of the vector [u,v]”. So, if you remember that eigenvalues tell us about the variance thus by simply analyzing the eigenvalues of the matrix M we can infer the results.

But the authors note that the exact computation of the eigenvalues is computationally expensive, since it requires the computation of a square root, and instead suggests the following score function which determines if a window contains a corner or not. This is shown below

Therefore, the algorithm does not have to actually compute the eigenvalue decomposition of the matrix M and instead it is sufficient to evaluate the determinant and trace of matrix M to find the corners.

Now, depending upon the magnitude of the eigenvalues and the score (R), we can decide whether a region is a corner, an edge, or flat.

When |R| is small, which happens when λ1 and λ2 are small, the region is flat.
When R<0, which happens when λ1>>λ2 or vice versa, the region is edge.
- If λ1>>λ2, then vertical edge
- otherwise horizontal edge
When R is large, which happens when λ1 and λ2 are large and λ1∼λ2, the region is a corner

This can also be represented by the below image

So, this algorithm will give us a score corresponding to each pixel. Then we need to do thresholding in order to find the corners.

Because we consider only the eigenvalues of the matrix (M), we are considering quantities that are invariant also to rotation, which is important because objects that we are tracking might rotate as well as move. So, this makes this algorithm rotation invariant.

So, this concludes the Harris Corner Detector. I hope you understood this. Now, let’s see how to do this using OpenCV-Python.

OpenCV

OpenCV provides a builtin function cv2.cornerHarris() that runs the Harris corner detector on the image. Below is the syntax for this.

cv2.cornerHarris(src, blockSize, ksize, k)

# src - Input single-channel 8-bit or floating-point image.
# blockSize - Neighborhood size used when computing the matrix M.
# ksize - Aperture parameter for the Sobel operator.
# k - Harris detector free parameter in the score equation.

cv2.cornerHarris(src, blockSize, ksize, k)

# src - Input single-channel 8-bit or floating-point image.

# blockSize - Neighborhood size used when computing the matrix M.

# ksize - Aperture parameter for the Sobel operator.

# k - Harris detector free parameter in the score equation.

For each pixel (x,y) it calculates a 2×2 gradient covariance matrix M(x,y) over a blockSize×blockSize neighborhood. Then using this matrix M, this calculates the score for each pixel. Below is the code for this

import cv2

# Read the image and convert to greyscale
img = cv2.imread('D:/downloads/contracing.png')
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)

# Find the corners using the Harris Corner Detector
dst = cv2.cornerHarris(gray,2,3,0.04)

#result is dilated for enhancing the corners, not important
dst = cv2.dilate(dst,None)

# Threshold for an optimal value, it may vary depending on the image.
thresh = 0.01*dst.max()
img[dst>thresh]=[0,255,0]

# Display the image
cv2.imshow('dst',img)
cv2.waitKey(0)

import cv2

# Read the image and convert to greyscale

img = cv2.imread('D:/downloads/contracing.png')

gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)

# Find the corners using the Harris Corner Detector

dst = cv2.cornerHarris(gray,2,3,0.04)

#result is dilated for enhancing the corners, not important

dst = cv2.dilate(dst,None)

# Threshold for an optimal value, it may vary depending on the image.

thresh = 0.01*dst.max()

img[dst>thresh]=[0,255,0]

# Display the image

cv2.imshow('dst',img)

cv2.waitKey(0)

Below is the result of this.

So, this is how you can implement the Harris Corner Detector using OpenCV-Python. I hope you understood this. In the next blog, we will discuss the Shi-Tomasi algorithm that improves this Harris Corner Detector even further. Hope you enjoy reading.

If you have any doubts/suggestions please feel free to ask and I will do my best to help or improve myself. Goodbye until next time.

TheAILearner

Mastering Artificial Intelligence

Tag Archives: feature detection opencv

SIFT: Scale-Space Extrema Detection

What is a Scale-Space?

How to construct a Scale-Space?

Scale-Space in SIFT

How to decide the number of octaves and number of scales per octave?

Shi-Tomasi Corner Detector

OpenCV

Limitations

Harris Corner Detection

OpenCV