In the previous blogs, we discussed some corner detectors such as Harris Corner, Shi-Tomasi, etc. If you remember, these corner detectors were rotation invariant, which basically means, even if the image is rotated we would still be able to detect the same corners. This is obvious because corners remain corners in the rotated image also. But when it comes to scaling, these algorithms suffer and don’t give satisfactory results. This is obvious because if we scale the image, a corner may not remain a corner. Let’s understand this with the help of the following image (Source: OpenCV)
See on the left we have a corner in the small green window. But when this corner is zoomed (see on the right), it no longer remains a corner in the same window. So, this is the issue that scaling poses. I hope you understood this.
So, to solve this, in 2004, D.Lowe, University of British Columbia, in his paper, Distinctive Image Features from Scale-Invariant Keypoints came up with a new algorithm, Scale Invariant Feature Transform (SIFT). This algorithm not only detects the features but also describes them. And the best thing about these features is that these features are invariant to changes in
- Scale
- Rotation
- Illumination (partially)
- Viewpoint (partially)
- Minor image artifacts/ Noise/ Blur
That’s why this was a breakthrough in this field at that time. So, you can use these features to perform different tasks such as object recognition, tracking, image stitching, etc, and don’t need to worry about scale, rotation, etc. Isn’t this cool and that too around 2004!!!
There are mainly four steps involved in SIFT algorithm to generate the set of image features
- Scale-space extrema detection: As clear from the name, first we search over all scales and image locations(space) and determine the approximate location and scale of feature points (also known as keypoints). In the next blog, we will discuss how this is done but for now just remember that the first step simply finds the approximate location and scale of the keypoints
- Keypoint localization: In this, we take the keypoints detected in the previous step and refine their location and scale to subpixel accuracy. For instance, if the approximate location is 17 then after refinement this may become 17.35 (more precise). Don’t worry we will discuss how this is done in the next blogs. After the refinement step, we discard bad keypoints such as edge points and the low contrast keypoints. So, after this step we get robust set of keypoints.
- Orientation assignment: Then we calculate the orientation for each keypoint using its local neighborhood. All future operations are performed on image data that has been transformed relative to the assigned orientation, scale, and location for each feature, thereby providing invariance to these transformations.
- Keypoint descriptor: All the previous steps ensured invariance to image location, scale and rotation. Finally we create the descriptor vector for each keypoint such that the descriptor is highly distinctive and partially invariant to the remaining variations such as illumination, 3D viewpoint, etc. This helps in uniquely identify features. Once you have obtained these features along with descriptors we can do whatever we want such as object recognition, tracking, stitching, etc. This sums up the SIFT algorithm on a coarser level.
Because SIFT is an extensive algorithm so we won’t be covering this in a single blog. We will understand each of these 4 steps in separate blogs and finally, we will implement this using OpenCV-Python. And as we will proceed, we will also understand how this algorithm achieves scale, rotation, illumination, and viewpoint invariance as discussed above.
So, in the next blog, let’s start with the scale-space extrema detection and understand this in detail. See you in the next blog. Hope you enjoy reading.
If you have any doubts/suggestions please feel free to ask and I will do my best to help or improve myself. Goodbye until next time.