Author Archives: kang & atul

Image Processing Quiz-7

Q1. What is a separable filter?

  1. a filter which can be written as a product of two more simple filters
  2. a filter that can separate noise from other features
  3. a filter which can be written as a sum of two more simple filters
  4. There is no such term!!!

Answer: 1
Explanation: A separable filter is the one that can be written as a product of two more simple filters.

Q2. Which of the following OpenCV functions can be used to perform Adaptive Thresholding?

  1. cv2.adaptiveThreshold()
  2. cv2.threshold()
  3. cv2.adaptThreshold()
  4. cv2.thresh()

Answer: 1
Explanation: In OpenCV, cv2.adaptiveThreshold() can be used to perform Adaptive Thresholding. Refer to this link to know more.

Q3. In some cases, the Adaptive Histogram Equalization technique tends to over-amplify the noise. Which technique is used to solve this problem?

  1. Histogram Specification
  2. CLAHE
  3. SWAHE
  4. All of the above

Answer: 2
Explanation: The Adaptive Histogram Equalization (AHE) technique tends to over-amplify the noise so to avoid this contrast limiting is applied and this method is known as Contrast Limited Adaptive Histogram Equalization (CLAHE).

Q4. What is Ringing effect in image processing?

  1. a rippling artifact near sharp edges
  2. a rippling artifact in smooth areas
  3. In this, the filter rings(warns) about the noise
  4. There is no such effect!!!

Answer: 1
Explanation: In image processing, ringing effect refers to a rippling artifact near sharp edges. To know more about this effect, refer to this link.

Q5. What is local contrast enhancement?

  1. In this, we divide the image into small regions and then perform contrast enhancement on these regions independently
  2. In this, we divide the image into small regions and then perform contrast enhancement on all these regions using same transformation function
  3. In this, we simply perform contrast enhancement on the entire image
  4. None of the above

Answer: 2
Explanation: As clear from the name, in local contrast enhancement we divide the image into small regions and then perform contrast enhancement on these regions independently. The transformation function for this is derived from the neighborhood of every pixel in the image.

Q6. In the Gaussian filter, what is the relation between standard deviation and blurring?

  1. Larger the standard deviation more will be the blurring
  2. Larger the standard deviation less will be the blurring
  3. No relation!!!

Answer: 1
Explanation: In Gaussian filter, larger the standard deviation more will be the blurring.

Q7. Which of the following are the common uses of image gradients?

  1. Edge detection
  2. Feature Matching
  3. Both of the above
  4. None of the above

Answer: 3
Explanation: Image gradients can be used for both Edge Detection (for instance in Canny Edge Detector) and Feature Matching.

Q8. How does the change in filter size affects blurring? Assume the filter is a smoothing filter.

  1. Blurring increases with decrease in the filter size
  2. Blurring decreases with increase in the filter size
  3. Blurring increases with increase in the filter size
  4. There is no effect of filter size on blurring!!!

Answer: 3
Explanation: As we increase the filter size, Blurring also increases.

Image Processing Quiz-6

Q1. What is a and b in Lab color space?

  1. a: Red/Green Value, b: Blue/Yellow Value
  2. a: Red/Blue Value, b: Green/Yellow Value
  3. a: Blue/Yellow Value, b: Red/Green Value
  4. None of the above

Answer: 1
Explanation: In Lab, L* stands for perceptual lightness, and a* and b* for the four unique colors of human vision: red, green, blue, and yellow. Refer to this link to know more.

Q2. In Nearest Neighbor, how many neighboring pixels are considered for calculating the intensity value for a new location?

  1. 3
  2. 1
  3. 2
  4. 0

Answer: 2
Explanation: As clear from the name, this method considers the nearest neighbor i.e. 1 pixel for calculating the intensity value for a new location. To know more about Nearest Neighbor, refer to this link.

Q3. _______ is used to keep the output image size similar to the input image during convolution operation?

  1. Padding
  2. Interpolation
  3. Dilation
  4. Erosion

Answer: 1
Explanation: Padding refers to the process of adding borders in an image with generally 0 valued pixels. Because during convolution the size of the image decreases so to prevent this we pad the original image.

Q4. What do you mean by Affine Transformation?

  1. a geometric transformation that preserves collinearity and parallelism
  2. a geometric transformation that preserves distances and angles but not collinearity
  3. transformation that is associated with the change in viewpoint
  4. a geometric transformation that preserves collinearity and distance but not parallelism

Answer: 1
Explanation: An affine transformation is any transformation that preserves collinearity, parallelism as well as the ratio of distances between the points (e.g. midpoint of a line remains the midpoint after transformation). It doesn’t necessarily preserve distances and angles. Refer to this link to know more.

Q5. Can we sharpen an image using a smoothing filter?

  1. Yes
  2. No

Answer: 1
Explanation: Yes, we can sharpen an image using a smoothing filter. For instance, both Unsharp Masking and Difference of Gaussian techniques both sharpen an image using a smoothing filter.

Q6. What do you mean by Domain filters?

  1. in which the filter weights are assigned according to the spatial closeness
  2. in which the filter weights are assigned according to the intensity difference
  3. in which the filter weights are assigned both according to the spatial closeness and intensity difference

Answer: 1
Explanation: As clear from the name, Domain filters are the one in which the filter weights are assigned according to the spatial closeness (i.e. Domain)

Q7. Which of the following histogram techniques can be used for image segmentation?

  1. Histogram Equalization
  2. CLAHE
  3. Histogram Backprojection
  4. Histogram Specification

Answer: 3
Explanation: Histogram Backprojection can be used for image segmentation. To know more about Histogram Backprojection, refer to this link.

Q8. In general, the gradient in x-direction will find ________?

  1. Horizontal edges
  2. Vertical edges
  3. Any type of edges
  4. Gradient has no relation with edges

Answer: 2
Explanation: Because gradient refers to the directional change in intensity so the gradient in x-direction will find Vertical edges. Refer to this link to know more.

Image Processing Quiz-5

Q1. Which of the following denotes 255 in the binary form?

  1. 11111111
  2. 00000000
  3. 01111111
  4. 11111110

Answer: 1
Explanation: In binary form, 255 is represented as 11111111. To know more about how to convert decimal to binary, refer to this link.

Q2. Which of the following OpenCV functions can be used to perform CLAHE?

  1. First we create a CLAHE object using “cv2.createCLAHE()” and then we apply this on the image using .apply() method
  2. cv2.applyCLAHE()
  3. cv2.clahe()
  4. None of the above

Answer: 1
Explanation: In OpenCV, first we create a CLAHE object using “cv2.createCLAHE()” and then we apply this on the image using .apply() method. Refer to this link to know more about this function.

Q3. What is the smallest element of an image?

  1. pixel
  2. dpi
  3. meter
  4. centimeter

Answer: 1
Explanation: In digital image processing, a pixel(or picture element) is the smallest item of information in an image.

Q4. Which of the following OpenCV functions can be used to apply an affine transformation to an image?

  1. cv2.warpAffine()
  2. cv2.affineTransform()
  3. cv2.applyAffine()
  4. cv2.WarpAffine()

Answer: 1
Explanation: In OpenCV, cv2.warpAffine() can be used to apply an affine transformation to an image. Refer to this link to know more.

Q5. Which of the following is an subtractive color model?

  1. RGB
  2. CMYK
  3. Both of the above
  4. None of the above

Answer: 2
Explanation: In subtractive model colors are perceived as a result of reflected light. For instance, Cyan is the complement of red, meaning that the cyan serves as a filter that absorbs red. The amount of cyan applied to a white sheet of paper controls how much of the red in white light will be reflected back from the paper. To know more about subtractive models, refer to this link.

Q6. What type of filters results in image sharpening?

  1. Low Pass filters
  2. High Pass filters

Answer: 2
Explanation: Because high pass filters enhances the high-frequency parts of an image, these results in image sharpening.

Q7. For a skewed image histogram, which technique will can be used for improving the global contrast?

  1. Histogram Equalization
  2. Histogram Matching
  3. Histogram Balancing
  4. None of the above

Answer: 2
Explanation: For skewed image histogram, one reasonable approach can be to manually specify the transformation function that preserves the general shape of the original histogram but has a smoother transition of intensity levels in the skewed areas. This is what we do in Histogram Matching.

Q8. What does the term “Shadows” refers to in a 1D image histogram?

  1. Leftmost part (the black and dark areas)
  2. Rightmost part (light and pure white areas)
  3. Center part (medium grey areas)
  4. There is no such term!!!

Answer: 1
Explanation: Shadows as clear from the name refers to the Leftmost part of the histogram that contains mostly the black and dark areas. To know more about Image Histograms, refer to this link.

Image Processing Quiz-4

Q1. Which of the following are the main steps used in Canny Edge Detector?

  1. Noise Reduction, Finding Intensity Gradient, Non-max Suppression, Hysteresis Thresholding
  2. Noise Reduction, Detecting contours, Hysteresis Thresholding
  3. Noise Reduction, Detecting contours, Non-max Suppression
  4. Noise Reduction, Non-max Suppression, Hysteresis Thresholding

Answer: 1
Explanation: The main steps used in Canny Edge Detector are Noise Reduction, Finding Intensity Gradient, Non-max Suppression, Hysteresis Thresholding. Refer to this link to know more.

Q2. Which of the following filter assigns more weights to the nearest pixels as compared to the distant pixels?

  1. Gaussian Filter
  2. Box Filter
  3. Median Filter
  4. All of the above

Answer: 1
Explanation: A Gaussian filter assigns more weights to the nearest pixels as compared to the distant pixels. To know more about Gaussian filter, refer to this link.

Q3. How many thresholds are used in hysteresis thresholding in Canny Edge Detector?

  1. 2
  2. 1
  3. 3
  4. 4

Answer: 1
Explanation: To solve the problem of “which edges are really edges and which are not” Canny uses the Hysteresis thresholding. In this, we set two thresholds ‘High’ and ‘Low’. Refer to this link to know more.

Q4. In Bilinear Interpolation, how many neighboring pixels are considered for calculating the intensity value for a new location?

  1. 3
  2. 1
  3. 2
  4. 4

Answer: 4
Explanation: Bi-linear interpolation means applying a linear interpolation in two directions. Thus, it uses 4 nearest neighbors for calculating the intensity value for a new location. To know more about Bilinear Interpolation, refer to this link.

Q5. Which of the following image sharpening techniques to use if the image contains a high degree of noise?

  1. Sobel
  2. Laplacian
  3. Difference of Gaussian
  4. Scharr

Answer: 3
Explanation: Because in Difference of Gaussian we are actually doing blurring which reduces the effect of noise to a great extent.

Q6. What response does a second order derivative filter will give along the ramps?

  1. Zero
  2. positive
  3. negative

Answer: 1
Explanation: Because ramps has constant slope so for ramp edge the first order derivates gives constant response along the ramp while second order gives 0 response along the ramp.

Q7. Which of the following OpenCV functions can be used to perform Hit-or-Miss Transform?

  1. cv2.morphologyEx(img, cv2.MORPH_HITMISS, kernel)
  2. cv2.morphologyEx(img, cv2.MORPH_HITORMISS, kernel)
  3. cv2.morphHitMiss()
  4. cv2.hitMiss()

Answer: 1
Explanation: In OpenCV, cv2.morphologyEx(img, cv2.MORPH_HITMISS, kernel) can be used to perform Hit-or-Miss Transform. Refer to this link to know more.

Q8. Which of the following Morphological operations closes the holes/gaps present in the object while keeping the initial object size same?

  1. Dilation
  2. Erosion
  3. Closing
  4. Opening

Answer: 3
Explanation: As clear from the name, Dilation dilates or expands the object region so for bridging/closing the holes Dilation can be used but to keep the initial object size same we need Erosion. Since Closing is Dilation followed by Erosion so this can also be used.

Image Processing Quiz-3

Q1. In an 8-bit color image, the intensity value (255,255,255) corresponds to which color? (Consider the RGB color model here)

  1. Black
  2. White
  3. Red
  4. Cyan

Answer: 2
Explanation: Because RGB is an additive color model i.e. the colors present in the light add to form new colors. So, Zero intensity for each component(RGB) gives the darkest color (no light, considered the black), and full intensity of each gives a white. Since for an 8-bit image the full intensity value is 255 so the answer is White. Refer to this link to know more.

Q2. Which of the following techniques can be used for blur detection or detecting blurred images?

  1. Variance of Laplacian
  2. Unsharp Masking
  3. High Boost filtering
  4. All of the above

Answer: 1
Explanation: As we all know that the blurry image doesn’t have well-defined edges. So, if you calculate the Laplacian of this image, you will get more or less the same response everywhere. In other words, the variance of this Laplacian image will be less. So, this can be used for blur detection.

Q3. Which of the following can be a reason for a low contrast image?

  1. Poor illumination of the scene
  2. wrong setting of lens aperture during image acquisition
  3. lack of dynamic range in the imaging sensor
  4. All of the above

Answer: 4
Explanation: All of the above can be possible reasons for getting a low contrast image.

Q4. For which type of images, the Histogram Equalization technique can be used?

  1. thermal images
  2. Satellite images
  3. X-ray images
  4. All of the above

Answer: 4
Explanation: Because Histogram Equalization is a contrast enhancement method so this can be used in all of the above images.

Q5. Which of the following OpenCV functions can be used to perform the Dilation operation?

  1. cv2.dilate()
  2. cv2.Dilate()
  3. cv2.dilate2D()
  4. cv2.morphDilate()

Answer: 1
Explanation: In OpenCV, cv2.dilate() can be used to perform Dilation operation. Refer to this link to know more.

Q6. Generally in a 1D image histogram, what do the x and y-axis represents?

  1. x-axis: Intensity values, Y-axis: no. of pixels corresponding to intensity values
  2. x-axis: no. of pixels corresponding to intensity values, Y-axis: Intensity values
  3. x-axis: pixel location, y-axis: Intensity values
  4. x-axis: Intensity values, y-axis: pixel location

Answer: 1
Explanation: In 1D image histogram, we plot the intensity values on the x-axis and the no. of pixels corresponding to intensity values on the y-axis. To know more about Image Histograms, refer to this link.

Q7. What of the following is the general form of representation of log transformation? Suppose r and s denote the input and output pixel values respectively.

  1. s=clog10(1/r)
  2. s=clog10(1+r)
  3. s=clog10(1-r)
  4. s=clog10(1*r)

Answer: 2
Explanation: Log transformation means replacing each pixel value with its logarithm. The general form of log transformation function is given by s=clog10(1+r). Refer to this link to know more.

Q8. What is a high pass filter?

  1. a filter that enhances the high-frequency parts of an image
  2. a filter that enhances the low-frequency parts of an image

Answer: 1
Explanation: A high pass filter is the one that enhances the high-frequency parts of an image.

Image Processing Quiz-2

Q1. Which of the following Morphological operations closes the holes/gaps present in the object while keeping the initial object size the same?

  1. Dilation
  2. Erosion
  3. Closing
  4. Opening

Answer: 3
Explanation: As clear from the name, Dilation dilates or expands the object region so for bridging/closing the holes Dilation can be used but to keep the initial object size same we need Erosion. Since Closing is Dilation followed by Erosion so this can also be used.

Q2. Which of the following OpenCV functions can be used to threshold an image?

  1. cv2.threshold()
  2. cv2.thresh()
  3. cv2.Thresh()
  4. cv2.Threshold()

Answer: 1
Explanation: In OpenCV, cv2.threshold() can be used to threshold an image. Refer to this link to know more.

Q3. Difference between the image and its opening is known as __________?

  1. Black top-hat transform
  2. White top-hat transform
  3. Hit or Miss
  4. Closing

Answer: 2
Explanation: Difference between the image and its opening is known as White top-hat transform. Refer to this link to know more.

Q4. What is the contrast in image processing?

  1. the difference in luminance or color that makes an object distinguishable from other objects within the same field of view.
  2. the difference in the resolution that makes an object distinguishable from other objects within the same field of view.
  3. same as brightness
  4. There is no such term in image processing!!!

Answer: 1
Explanation: In image processing, contrast refers to the difference in luminance or color that makes an object distinguishable from other objects within the same field of view. To know more about contrast, refer to this link.

Q5. Which of the following refers to a type of noise in an image?

  1. Gaussian
  2. Salt and Pepper
  3. Speckle
  4. All of the above

Answer: 4
Explanation: All of the above refers to a type of noise in an image.

Q6. Which of the following Morphological operations can be used for shape detection or finding particular patterns in the given image?

  1. Morphological Gradient
  2. Hit-or-Miss Transform
  3. Top-hat Transform
  4. Opening

Answer: 2
Explanation: Hit-or-Miss Transform can be used for shape detection or finding particular patterns in the given image. In this, we use two structuring elements (say B1 and B2) and ask a simple question of does B1 fits the object while, simultaneously, B2 misses the object, i.e. fits the background. Refer to this link to know more.

Q7. Which of the following matplotlib functions can be used to calculate the image histogram? Suppose we import as “import matplotlib.pyplot as plt”

  1. plt.hist()
  2. plt.calcHist()
  3. plt.showHist()
  4. plt.histCalc()

Answer: 1
Explanation: In matplotlib, plt.hist() can be used to calculate the image histogram. Refer to this link to know more about this function.

Q8. In Perspective Transformation, what is the minimum number of points to select to obtain the transformation matrix?

  1. 1
  2. 2
  3. 3
  4. 4

Answer: 4
Explanation: In Perspective Transformation, the transformation matrix (M) is defined by 8 constants, thus to find this matrix we need 4 points. Refer to this link to know more.

Image Processing Quiz-1

Q1. Which domain refers to the Fourier transform of an image?

  1. Spatial domain
  2. Frequency domain

Answer: 2
Explanation: In transform domain we first transform an image into another domain (like frequency) by applying for instance Fourier transform, do processing there and convert it back to the spatial domain by some inverse operations.

Q2. Which of the following techniques can be used for image segmentation?

  1. Histogram Equalization
  2. CLAHE
  3. Histogram Backprojection
  4. Histogram Specification

Answer: 3
Explanation: Histogram Backprojection can be used for image segmentation. To know more about Histogram Backprojection, refer to this link.

Q3. Which of the following OpenCV functions can be used to perform convolution operations?

  1. cv2.filter2D()
  2. cv2.convolve()
  3. cv2.filter()
  4. cv2.conv2D()

Answer: 1
Explanation: In OpenCV, cv2.filter2D() can be used to perform convolution operation. Refer to this link to know more about this function.

Q4. Which color model is used in printing?

  1. Additive color model
  2. Subtractive color model

Answer: 2
Explanation: Subtractive color model (CMYK) is used in printing. Refer to this link to know more.

Q5. Which domain refers to the image plane itself?

  1. Spatial domain
  2. Frequency domain

Answer: 1
Explanation: Spatial domain refers to the image plane. This means we perform all operations directly on image pixels.

Q6. In the dilation operation, generally the output image features becomes ________ ?

  1. Thinner
  2. Thicker
  3. Blurred
  4. Sharpened

Answer: 2
Explanation: Because Dilation dilates or expands the object region, the output image features becomes thicker. Refer to this link to know more.

Q7. Dilation is the ________ of Erosion?

  1. dual
  2. rotated version
  3. translated version
  4. neighbor

Answer: 1
Explanation: Dilation is the dual of erosion. Dual in the sense that dilating the object region is equivalent to eroding the background region and vice versa.

Q8. What type of filters results in image sharpening?

  1. Low Pass filters
  2. High Pass filters

Answer: 2
Explanation: Because high pass filters enhances the high-frequency parts of an image, these results in image sharpening.

Blur Detection using the variance of the Laplacian method

In the previous blog, we discussed how to detect low contrast images using the scikit image library. Similar to low contrast images, the blurred images also don’t provide any additional information for our task. So, it’s better to discard these blurred images before doing any task such as in computer vision or any other. Blur detection is an active research topic and several algorithms have been proposed not only for detecting blur but also to deblur the image. So, in this blog, we will discuss one such simple yet effective method for detecting blur. So, let’s get started.

As we all know that the blurry image doesn’t have well-defined edges. So, if you calculate the Laplacian of this image, you will get more or less the same response everywhere. In other words, the variance of this Laplacian image will be less. Now the main question is how much less is less. So you choose a threshold and if the variance is less than this threshold, the image is blurred otherwise not.

So, for a blurred image, the variance of the laplacian will be less as compared to the sharp image. That is why this method is known as the variance of the Laplacian.

Now, the main thing is to set a threshold that decides if an image is blurred or not. Actually, this is a tricky part and this all depends upon your application. So you may need to try out different threshold values and pick out the one that works well for your application. I hope you understood this. Now, let’s see how to implement this using OpenCV-Python.

Steps

  • Load the image
  • Convert this to greyscale
  • Calculate the laplacian of this image and find the variance
  • If variance < threshold then blurred, otherwise not

So this is how this method works. As we already know that the laplacian is very sensitive to noise so this may not give good results. Also setting a good threshold value is also a tricky part. This method is fast and easy to implement but is not guaranteed to work for almost every case. As I already told you that this is an active research area so in the next blog, we will use the Fourier transform and see how it goes. That’s all for this blog. Hope you enjoy reading.

If you have any doubts/suggestions please feel free to ask and I will do my best to help or improve myself. Goodbye until next time.

Finding Corners with SubPixel Accuracy

In the previous blogs, we discussed how to find the corners using algorithms such as Harris Corner, Shi-Tomasi, etc. If you notice, the detected corners had integer coordinates such as (17,34), etc. This generally works if we were extracting these features for recognition purposes but when it comes to some geometrical measurements we need more precise corner locations such as real-valued coordinates (17.35,34.67). So, in this blog, we will see how to refine the corner locations (detected using Harris or Shi-Tomasi Detector) with sub-pixel accuracy.

OpenCV

OpenCV provides a builtin function cv2.cornerSubPix() that finds the sub-pixel accurate location of the corners. Below is the syntax of this

This function uses the dot product trick and iteratively refines the corner locations till the termination criteria is reaches. Let’s understand this in somewhat more detail.

Consider the image shown below. Suppose, q is the starting corner location and p is the point located within the neighborhood of q.

Clearly, the dot product between the gradient at p and the vector q-p is 0. For instance, for the first case because p0 lies in a flat region, so the gradient is 0 and hence the dot product. For the second case, the vector q-p1 lies on the edge and we know that the gradient is perpendicular to the edge so the dot product is 0.

Similarly, we take other points in the neighborhood of q (defined by the winSize parameter) and set the dot product of gradient at that point and the vector to 0 as we did above. Doing so we will get a system of equations. These equations form a linear system that can be solved by the inversion of a single autocorrelation matrix. But this matrix is not always invertible owing to small eigenvalues arising from the pixels very close to q. So, we simply reject the pixels in the immediate neighborhood of q (defined by the zeroZone parameter).

This will give us the new location for q. Now, this will become our starting corner location. Keep iterating until the user-specified termination criterion is reached. I hope you understood this.

Now, let’s take a look at the arguments that this function accepts.

  • image: Input single-channel, 8-bit grayscale or float image
  • corners: Array that holds the initial approximate location of corners
  • winSize: Size of the neighborhood where it searches for corners. This is the Half of the side length of the search window. For example, if winSize=Size(5,5) , then a (5∗2+1)×(5∗2+1)=11×11 search window is used
  • zeroZone: This is the half of the neighborhood size we want to reject. If you don’t want to reject anything pass (-1.-1)
  • criteria: Termination criteria. You can either stop it after a specified number of iterations or a certain accuracy is achieved, or whichever occurs first.

For instance, in the above image the red pixel is the initial corner. The winSize is (3,3) and the zeroZone is (1,1). So, only the green pixels have been considered for generating equations while the grey pixels have been rejected.

Now, let’s take the below image and see how to do this using OpenCV-Python

Steps

  • Load the image and find the corners using Harris Corner Detector as we did in the previous blog. You can use Shi-Tomasi detector also
  • Now, there may be a bunch of pixels at the corner, so we take their centroids
  • Then, we define the stopping criteria and refine the corners to subpixel accuracy using the cv2.cornerSubPix()
  • Finally, we used red color to mark Harris corners and green color to mark refined corners

Below are the results of this. For visualization, I have shown the zoomed in version on the right.

Applications

Subpixel corner locations are a common measurement used in camera calibration or when tracking to reconstruct the camera’s path or the three-dimensional structure of a tracked object or used in some algorithms such as SIFT (discussed in the next blog), etc.

That’s all for this blog. Hope you enjoy reading.

If you have any doubts/suggestions please feel free to ask and I will do my best to help or improve myself. Goodbye until next time.

SIFT: Scale-Space Extrema Detection

In the previous blog, we had an overview of the SIFT algorithm. We discussed different steps involved in this and the invariance that it offers against scale, rotation, illumination, viewpoint, etc. But we didn’t discuss it in detail. So, in this blog, let’s start with the first step which is scale-space extrema detection. So, let’s get started.

Before moving forward, let’s quickly recap what we are doing and why we are doing it and how we are doing it?

  • We saw how the corner detectors like Harris, Shi-Tomasi, etc suffer when it comes to scaling. (Why we are doing)
  • So, we want to detect features that are invariant to scale changes and can be robustly detected (What we are doing)
  • This can be done by searching for stable features (Extremas) across all possible scales, using a continuous function of scale known as scale space. That’s why the name scale-space extrema detection. (How we are doing)

I hope you understood this. Now, let’s understand what is scale-space.

What is a Scale-Space?

As we all know that the real-world objects are composed of different structures at different scales. For instance, the concept of a “tree” is appropriate at the scale of meters, while concepts such as leaves and molecules are more appropriate at finer scales. It would make no sense to analyze leaves or molecules at the scale of the tree (meters). So, this means that you need to analyze everything at an appropriate scale in order to make sense.

But given an image or an unknown scene, there is no apriori way to determine what scales are appropriate for describing the interesting structures in the image data. Hence, the only reasonable approach is to consider descriptions at multiple scales. This representation of images at multiple scales constitutes a so-called scale-space.

How to construct a Scale-Space?

Now, the next thing is how to construct a scale-space? So as we know if we increase the scale, the fine details will be lost and only coarser information prevails. Can you relate this with something you did in image processing? Does Blurring an image with a low pass filter sound similar to this? The answer is yes but there is a catch. We can’t use any low pass filter, only the Gaussian filter helps in mimicking a scale space. This is because the Gaussian filter is shown to be the only filter that obeys the following

  • Linearity
  • Shift-invariance
  • smoothing process does not produce new structures when going from fine to coarser scale
  • Rotational symmetry and some other properties (You can read about it on Wikipedia)

So to create a scale space, you take the original image and generate progressively blurred-out images using a Gaussian filter. Mathematically, the scale-space representation of an image can be expressed as

  • L(x,y,σ) is the blurred image or scale space representation of an image
  • G(x,y,σ) is the Gaussian filter
  • I(x,y) is the image
  • σ is the scaling parameter or the amount of blur. As we increase σ, more and more details are removed from the image i.e. more blur

See below where an image is shown at different scales(σ) (source: Wikipedia). See how at larger scales, the fine details got lost.

So, I hope you understood what is a scale-space and how to construct it using Gaussian filter.

Scale-Space in SIFT

In the SIFT paper, the authors modified the scale-space representation. Instead of creating the scale-space representation for the original image only, they created the scale-space representations for different image sizes. This helps in increasing the number of keypoints detected. The idea is shown below

Take the original image, and generate progressively blurred out images. Then, resize the original image to half size. And generate blurred out images again. And keep repeating. This is shown below

Here, we use the term octave to denote the scale-space representation for a particular image size. For instance, all the same size images in vertical line forms one octave. Here, we have 3 octaves and all the octaves contain 4 images at different scales (blurred using Gaussian filter).

Within an octave, the adjacent scales differ by a constant factor k. If an octave contains s+1 images, then k = 2(1/s). The first image has scale σ0, the second image has scale kσ0, the third image has scale k2σ0, and the last image has scale ksσ0. In the paper, they have used the values as number of octaves = 4, number of scale levels = 5, initial σ0 =1.6, k=√2 etc.

How to decide the number of octaves and number of scales per octave?

The number of octaves and scale depends on the size of the original image. You need to adjust this yourself depending upon the application.

But it has been found out empirically that 3 number of scales sampled per octave provide optimal repeatability under downsampling/upsampling/rotation of the image as well as image noise. Also, Adding more scales per octave will increase the number of detected keypoints, but this does not improve the repeatability (in fact there is a small decrease) – so we settle for the computationally less expensive option. See the below plot

So, once we have constructed the scale-space, the next task is to detect the extrema in this scale-space. That’s why this step is called scale-space extrema detection. To keep this blog short, we will discuss this in the next blog. Hope you enjoy reading.

If you have any doubts/suggestions please feel free to ask and I will do my best to help or improve myself. Goodbye until next time.