Tag Archives: feature detection opencv

SIFT: Scale-Space Extrema Detection

In the previous blog, we had an overview of the SIFT algorithm. We discussed different steps involved in this and the invariance that it offers against scale, rotation, illumination, viewpoint, etc. But we didn’t discuss it in detail. So, in this blog, let’s start with the first step which is scale-space extrema detection. So, let’s get started.

Before moving forward, let’s quickly recap what we are doing and why we are doing it and how we are doing it?

  • We saw how the corner detectors like Harris, Shi-Tomasi, etc suffer when it comes to scaling. (Why we are doing)
  • So, we want to detect features that are invariant to scale changes and can be robustly detected (What we are doing)
  • This can be done by searching for stable features (Extremas) across all possible scales, using a continuous function of scale known as scale space. That’s why the name scale-space extrema detection. (How we are doing)

I hope you understood this. Now, let’s understand what is scale-space.

What is a Scale-Space?

As we all know that the real-world objects are composed of different structures at different scales. For instance, the concept of a “tree” is appropriate at the scale of meters, while concepts such as leaves and molecules are more appropriate at finer scales. It would make no sense to analyze leaves or molecules at the scale of the tree (meters). So, this means that you need to analyze everything at an appropriate scale in order to make sense.

But given an image or an unknown scene, there is no apriori way to determine what scales are appropriate for describing the interesting structures in the image data. Hence, the only reasonable approach is to consider descriptions at multiple scales. This representation of images at multiple scales constitutes a so-called scale-space.

How to construct a Scale-Space?

Now, the next thing is how to construct a scale-space? So as we know if we increase the scale, the fine details will be lost and only coarser information prevails. Can you relate this with something you did in image processing? Does Blurring an image with a low pass filter sound similar to this? The answer is yes but there is a catch. We can’t use any low pass filter, only the Gaussian filter helps in mimicking a scale space. This is because the Gaussian filter is shown to be the only filter that obeys the following

  • Linearity
  • Shift-invariance
  • smoothing process does not produce new structures when going from fine to coarser scale
  • Rotational symmetry and some other properties (You can read about it on Wikipedia)

So to create a scale space, you take the original image and generate progressively blurred-out images using a Gaussian filter. Mathematically, the scale-space representation of an image can be expressed as

  • L(x,y,σ) is the blurred image or scale space representation of an image
  • G(x,y,σ) is the Gaussian filter
  • I(x,y) is the image
  • σ is the scaling parameter or the amount of blur. As we increase σ, more and more details are removed from the image i.e. more blur

See below where an image is shown at different scales(σ) (source: Wikipedia). See how at larger scales, the fine details got lost.

So, I hope you understood what is a scale-space and how to construct it using Gaussian filter.

Scale-Space in SIFT

In the SIFT paper, the authors modified the scale-space representation. Instead of creating the scale-space representation for the original image only, they created the scale-space representations for different image sizes. This helps in increasing the number of keypoints detected. The idea is shown below

Take the original image, and generate progressively blurred out images. Then, resize the original image to half size. And generate blurred out images again. And keep repeating. This is shown below

Here, we use the term octave to denote the scale-space representation for a particular image size. For instance, all the same size images in vertical line forms one octave. Here, we have 3 octaves and all the octaves contain 4 images at different scales (blurred using Gaussian filter).

Within an octave, the adjacent scales differ by a constant factor k. If an octave contains s+1 images, then k = 2(1/s). The first image has scale σ0, the second image has scale kσ0, the third image has scale k2σ0, and the last image has scale ksσ0. In the paper, they have used the values as number of octaves = 4, number of scale levels = 5, initial σ0 =1.6, k=√2 etc.

How to decide the number of octaves and number of scales per octave?

The number of octaves and scale depends on the size of the original image. You need to adjust this yourself depending upon the application.

But it has been found out empirically that 3 number of scales sampled per octave provide optimal repeatability under downsampling/upsampling/rotation of the image as well as image noise. Also, Adding more scales per octave will increase the number of detected keypoints, but this does not improve the repeatability (in fact there is a small decrease) – so we settle for the computationally less expensive option. See the below plot

So, once we have constructed the scale-space, the next task is to detect the extrema in this scale-space. That’s why this step is called scale-space extrema detection. To keep this blog short, we will discuss this in the next blog. Hope you enjoy reading.

If you have any doubts/suggestions please feel free to ask and I will do my best to help or improve myself. Goodbye until next time.

Shi-Tomasi Corner Detector

In the previous blog, we discussed the Harris Corner Detector and saw how this uses a score function to evaluate whether a point is a corner or not. But this algorithm doesn’t always yield satisfactory results. So, in 1994, J. Shi and C. Tomasi in their paper Good Features to Track made a small modification to it which shows better results compared to Harris Corner Detector. So, let’s understand how they improved the algorithm.

As you might remember, the scoring function used by Harris Corner Detector is

Instead of this, Shi-Tomasi proposed a new scoring function

So, for a pixel, if this score R is greater than a certain threshold then that pixel is considered as a corner. Similar to Harris Corner Detector if we plot this in λ1−λ2 space, then we will get the below plot

So, as we can see that

  • only when λ1 and λ2 are above a minimum value, λmin, it is considered as a corner(green region)
  • when either λ1 or λ2 are below a minimum value, λmin, it is considered as a edge(orange region)
  • when both λ1 and λ2 are below a minimum value, λmin, it is considered as a flat region(grey region)

So, this is the improvement that Shi-Tomasi did to the Harris Corner Detector. Other than this, the entire algorithm is the same. Now, let’s see how to implement this using OpenCV-Python.

OpenCV

OpenCV provides a built-in function  cv2.goodFeaturesToTrack() that finds N strongest corners in the image by either Shi-Tomasi or Harris Corner Detector. Below is the algorithm that this function uses

  • First, this function calculates the corner quality score at every pixel using either Shi-Tomasi or Harris Corner
  • Then this function performs a non-maximum suppression (the local maximums in 3 x 3 neighborhood are retained).
  • After this, all the corners with the quality score less than qualityLevel*maxx,yqualityScore(x,y) are rejected. This maxx,yqualityScore(x,y) is the best corner score. For instance, if the best corner has the quality score = 1500, and the qualityLevel=0.01 , then all the corners with the quality score less than 15 are rejected.
  • Now, all the remaining corners are sorted by the quality score in the descending order.
  • Function throws away each corner for which there is a stronger corner at a distance less than maxDistance.

Here is the syntax of this function

Now, let’s take the image we used in the previous blog and detect the top 20 corners. Below is the code for this

Below is the result of this

You can also use the Harris Corner Detector method by specifying the flag useHarrisDetector and the k parameter in the above function as shown

So, that’s all about Shi-Tomasi Detector.

Limitations

Both Shi-Tomasi and Harris Corner work well for most of the cases but when the scale of the image changes both of these algorithms doesn’t give satisfactory results. So, in the next blog, we will discuss one of the famous algorithms for finding scale-invariant features known as SIFT (Scale-Invariant Feature Transform). This algorithm was a breakthrough in this field. See you in the next blog. Hope you enjoy reading.

If you have any doubts/suggestions please feel free to ask and I will do my best to help or improve myself. Goodbye until next time.

Harris Corner Detection

In the previous blog, we discussed what are features and how corners are considered as a good feature as compared to edges and flat surfaces. In this blog, let’s discuss one of the famous and most commonly used corner detection methods known as Harris Corner Detection. This was one of the early attempts to find the corners by Chris Harris & Mike Stephens in their paper A Combined Corner and Edge Detector in 1988. Now it is called the Harris Corner Detector. So, let’s first understand the basic idea behind this algorithm, and then we will dive into mathematics. Let’s get started.

As discussed in the previous blog, corners are regions in the image with large variations in intensity in all directions. For instance, take a look at the below image. If you shift the window by a small amount, then corners will produce a significant change in all directions while edges will output no change if we move the window along the edge direction. And the flat region will output no change in all directions on window movement.

So, the authors took this simple idea of finding the difference in intensity for a displacement of (u,v) in all directions into a mathematical form. This is expressed as

Here,

  • the window function is either a rectangular window or a Gaussian window which gives weights to pixels underneath.
  • E(u,v) is the difference in intensities between the original and the moved window.

As can be clearly seen, for nearly constant patches the error function will be close to 0 while for distinctive patches this will be larger. Hence, our aim is to find patches where this error function is large. In other words, we need to maximize this error function for corner detection. That means we have to maximize the second term. We can do this by applying Taylor Expansion and using some mathematical steps as shown below

So, the final equation becomes

Then comes the main part. As we have already discussed that corners are the regions in the image with large variations in intensity in all directions. Or we can say it in terms of the above matrix M as “A corner is characterized by a large variation of M in all directions of the vector [u,v]”. So, if you remember that eigenvalues tell us about the variance thus by simply analyzing the eigenvalues of the matrix M we can infer the results.

But the authors note that the exact computation of the eigenvalues is computationally expensive, since it requires the computation of a square root, and instead suggests the following score function which determines if a window contains a corner or not. This is shown below

Therefore, the algorithm does not have to actually compute the eigenvalue decomposition of the matrix M and instead it is sufficient to evaluate the determinant and trace of matrix M to find the corners.

Now, depending upon the magnitude of the eigenvalues and the score (R), we can decide whether a region is a corner, an edge, or flat.

  • When |R| is small, which happens when λ1 and λ2 are small, the region is flat.
  • When R<0, which happens when λ1>>λ2 or vice versa, the region is edge.
    • If λ1>>λ2, then vertical edge
    • otherwise horizontal edge
  • When R is large, which happens when λ1 and λ2 are large and λ1∼λ2, the region is a corner

This can also be represented by the below image

So, this algorithm will give us a score corresponding to each pixel. Then we need to do thresholding in order to find the corners.

Because we consider only the eigenvalues of the matrix (M), we are considering quantities that are invariant also to rotation, which is important because objects that we are tracking might rotate as well as move. So, this makes this algorithm rotation invariant.

So, this concludes the Harris Corner Detector. I hope you understood this. Now, let’s see how to do this using OpenCV-Python.

OpenCV

OpenCV provides a builtin function cv2.cornerHarris() that runs the Harris corner detector on the image. Below is the syntax for this.

For each pixel (x,y) it calculates a 2×2 gradient covariance matrix M(x,y) over a blockSize×blockSize neighborhood. Then using this matrix M, this calculates the score for each pixel. Below is the code for this

Below is the result of this.

So, this is how you can implement the Harris Corner Detector using OpenCV-Python. I hope you understood this. In the next blog, we will discuss the Shi-Tomasi algorithm that improves this Harris Corner Detector even further. Hope you enjoy reading.

If you have any doubts/suggestions please feel free to ask and I will do my best to help or improve myself. Goodbye until next time.