Author Archives: kang & atul

Optical Character Recognition Pipeline: Text Detection

In the previous blogs, we discussed different pre-processing techniques such as noise removal, skew correction, etc. The main objective of this pre-processing step was to make the image suitable for the next pipeline components such as text detection, and recognition. Now, in this blog, let’s understand the text detection step in detail.

Text Detection

Text detection simply means finding the regions in the image where the text can be present. For instance, see the below image where green colored bounding boxes are drawn around the detected text.

While performing text detection, you may encounter two types of cases

  • Images with Structured text: This refers to the images that have a clean/uniform background with regular font. Text is mostly dense with proper row structure and uniform text color. For instance, see the below image.
  • Images with Unstructured text: This refers to the images with sparse text on a complex background. The text can have different colors, size, fonts, and orientations and can be present anywhere in the image. Performing text detection on these images is known as scene text detection. For instance, see the below image.

Now, if I ask, which one of the above two cases looks more challenging. Obviously, the answer would be the scene text detection one, due to various complexities as discussed above. And that’s why this is an active research topic in computer vision.

Note: For more details on the Optical Character Recognition , please refer to the Mastering OCR using Deep Learning and OpenCV-Python course.

While performing text detection, you have 3 options. Either you do

  • Character-by-Character detection
  • Word-by-Word detection
  • Line-by-Line detection

All three are shown below.

Nowadays, we mostly prefer doing word or line detection. This is because the character detection is generally slow and is somewhat more challenging as compared to the other two.

Mostly, the text detection methods can be broadly classified into 2 categories

  • Conventional methods
  • Deep-learning based methods

Conventional methods rely on manually designed features. For instance, Stroke width Transform (SWT) and Maximally Stable Extremal Regions (MSER) based methods generally extracts the character candidates via edge detection or extremal region extraction. While in the deep learning based methods, features are learned from the training data. These are generally better than the conventional ones, in terms of both accuracy and adaptability in challenging scenarios.

Further, the deep learning based methods can be classified into

  • Multi-step methods
  • Simplified pipeline

To understand these, take a look at the below image where the pipeline of several state-of-the-art text detection methods is shown. The first 3 methods (a,b,c) fall into the multi-step category (each box denotes 1 step) while the last 2 (d,e) are the ones with a simplified pipeline.

In this series, we will be mainly focussing on the methods with the simplified pipeline. By the way, the last 2 methods (d,e) shown above are known as Connectionist Text Proposal Network (CTPN) and Efficient and Accurate Scene Text Detector (EAST) respectively. Both of these are very famous text detection methods!!!

In the next blog, let’s discuss the EAST algorithm in detail. Till then, have a great time. Hope you enjoy reading.

If you have any doubts/suggestions please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Template matching using OpenCV

In the previous blogs, we discussed different segmentation algorithms. Now, let’s explore another important computer vision area known as object detection. This simply means identifying and locating objects, that is, where is this object present in the image. There are various algorithms available for object detection. In this blog, let’s discuss one such algorithm known as Template matching. So, let’s get started.

In the template matching, we have a template image and we need to find where is this in the input image. For this, the sliding window approach is used. In the sliding window approach, we simply slide the template image over the input image (similar to convolution) and compare the overlapping patch. For comparison, you can use any method such as cross-correlation, squared difference, etc. Below is the list of the comparison methods provided by OpenCV.

Here, I(x,y) denotes the input image, T(x,y) template image, R(x,y) result, and (w,h) as width and height of the template image.

This outputs a grayscale image, where each pixel represents how much does the neighborhood of that pixel match with the template (i.e. the comparison score). From this, we can either select the maximum/minimum (depending on the comparison method used) or use thresholding to select the probable region of interest. This is how the template matching works. Now, let’s see how to do this using OpenCV-Python.

OpenCV

OpenCV provides a built-in function cv2.matchTemplate() that implements the template matching algorithm. This takes as input the image, template and the comparison method and outputs the comparison result. The syntax is given below.

Let’s take the below input and template image and see how this works.

  • Read both the input and the template image
  • Apply the template matching using cv2.matchTemplate()
  • If the method is cv2.TM_SQDIFF or cv2.TM_SQDIFF_NORMED, take the minimum, otherwise, take the maximum. This can be done using the cv2.minMaxLoc() function which finds the minimum and maximum element values and their positions.
  • Once the min/max position is found, we can easily draw the rectangle by taking this position as the top-left corner and using (w,h) of the template.

Template Matching with Multiple Objects

But what if we have multiple instances of the object present in an image. Clearly, the above approach will not work as this only finds a single instance of an object because we are finding max/min location. One plausible approach would be to use thresholding. Instead of taking out the max/min, we take out all the values greater/less than a certain threshold limit. This will not always work perfectly but still in some cases this approach can provide reasonable results.

Let’s take the below image and template to understand the multiple objects case.

Below is the resultant image.

Problems!!!

From the above result, you might have guessed the problems with this approach. Clearly, template matching is translation invariant. But as you can see from the above image, the hearts which are rotated and which have smaller size are not detected. Thus, this algorithm is not rotation and scale-invariant. Other potential problems include occlusion, illumination, and background changes, etc.

In the next blog, let’s improve this algorithm further and make it more robust against scale. For this, we will use the concept of image pyramids. See you in the next blog. Hope you enjoy reading.

If you have any doubts/suggestions please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Interactive Foreground Extraction using GrabCut Algorithm OpenCV

In this blog, we will discuss how to use the GrabCut algorithm for the foreground extraction. At that time (around 2004), the GrabCut algorithm outperformed most of the available foreground extraction methods both in terms of the resulting output quality and the simplicity of user input. Let’s first discuss the theory part and then implement it using OpenCV. So, let’s get started.

Theory

In this first, we need to draw a rectangle around the foreground region such that the region outside the rectangle is sure background while the region inside the rectangle is a combination of both foreground and background as shown below. Everything inside the rectangle is unknown, which we need to classify into foreground and background.

Now, we initialize everything outside the rectangle to be 0 (background) and inside to 1 (foreground). This is our initial labelling.

The authors have used a Gaussian Mixture Model (GMM) to model the foreground and background distributions. So, we will initialize 2 GMMs (one for background and the other for foreground) with k components (k=5 used) from the initial labeling we did earlier. Here, the GMM parameters correspond to the weights π, means µ, and covariances Σ of the 2K Gaussian components for the background and foreground distributions.

From these GMMs, create a new pixel distribution for the unknown region. From this pixel distribution, a graph is constructed with nodes representing the pixels as shown below. Each neighboring node is linked with edges whose weights are defined by the edge information or pixel similarity. This means similar pixels (in terms of color) will get higher edge weightage and vice-versa.

Further 2 additional nodes are added, the Source node and the Sink node. Every foreground pixel is connected to the Source node and every background pixel is connected to the Sink node with weights defined by the probability of pixel belonging to the foreground/background (obtained from GMMs).

Now, we use a MinCut algorithm to segment the graph. This divides the graph into two groups(separating the source node and sink node) that minimize the cost function. Label all the pixels (nodes) belonging to the source as foreground while those connected to sink as background. This is our new labeling. Now, repeat this process until convergence.

There is also an additional option for user editing. Let’s understand what this means. In some cases, the segmentation output will not be perfect, that is, some foreground regions may be marked as background and vice-versa. In that case, the user can specify the faulty regions (as belonging to the foreground and background) using a mask image. Re-run the algorithm. This approach is shown below.

OpenCV

OpenCV provides a built-in function cv2.grabCut() that implements the GrabCut algorithm. This provides both the modes, with a rectangle or with a mask as discussed above. The syntax is given below.

  • img: Input 8-bit 3-channel image
  • mask: 8-bit, single-channel image. In the case of rectangle mode, the input mask is initialized with 0’s. While for the mask mode, the input mask should contain the background and foreground regions labeled with 0’s and 1’s respectively.
  • rect: ROI containing the foreground in the form of (x,y,w,h) as discussed above. Only used when mode= cv2.GC_INIT_WITH_RECT otherwise set to None.
  • bgdModel, fgdModel: Temporary arrays used by the algorithm internally. Just create two 0’s arrays of size (1,65) and float64 dtype. Do not modify it while you are processing the same image.
  • iterCount: Number of iterations.
  • mode: Either cv2.GC_INIT_WITH_RECT or cv2.GC_INIT_WITH_MASK depending on whether we are drawing a rectangle or mask with strokes.

This outputs the modified mask image where each pixel belongs to either of the 4 classes sure background, sure foreground, probable background, and probable foreground. These 4 regions are specified by either values 0,1,2, and 3 or by flags cv2.GC_BGD,  cv2.GC_FGD,  cv2.GC_PR_BGD,  cv2.GC_PR_FGD respectively. Now, let’s take the below image and implement this algorithm using OpenCV.

Let’s start with the rectangular mode. First load the image and create a 0’s mask and fgdModel and bgdModel as discussed above.

Next, draw the rectangle around the ROI. The coordinates can be obtained by opening the image using matplotlib or any other application such as Paint etc. I used matplotlib as shown below. Found the coordinates by hovering the mouse.

The rectangle is shown in red color in the below image.

Now, run the grabcut algorithm. This will output the modified mask.

As discussed, in the modified mask image, 0 and 2 corresponds to the background while 1 and 3 correspond to the foreground.

Below is the mask and the segmented output image.

Clearly, all the background region is segmented out but there are some foreground regions missing such as head, fingers, captain armband, etc. So, let’s see how to recover these regions using the user editing feature as discussed above.

Open this segmented image in any editing software such as Paint etc. Now, mark the missing regions with any color. Here, I have used white as shown below. Since there is no background part misclassified as foreground, so no need to label the background. If that’s not the case, you need to mark it with a different color. Below is the marked image.

To obtain the mask, just subtract the above 2 images.

Below the final mask is shown. This is what you marked using editing software. You can actually directly create this using any software.

Now, in the modified mask, mark this as the sure foreground. Remember, sure foreground needs to be labeled as 1 in the mask.

So, your mask now contains user edited information. Run the grabcut algorithm with mask mode.

Again change pixels with value 0 and 2 to background, 1 and 3 to foreground.

Below is the final segmented image.

See how we are able to segment the missing foreground regions now. The full code is given below

In the next blog, we will see an interactive implementation of the grabcut algorithm. Hope you enjoy reading.

Referenced Research Paper:  “GrabCut”: interactive foreground extraction using iterated graph cuts

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time

Image Segmentation with Watershed Algorithm

In this blog, we will discuss marker-based image segmentation using the watershed algorithm in detail. So, let’s first discuss what is a watershed in the context of an image.

The watershed algorithm is based on the concept of visualizing an image as a topographic surface where high-intensity values denote peaks and hills while the low intensity denotes valleys. This can be obtained by plotting the (x,y) image coordinates versus the intensity as shown below.

In this kind of surface plot, there are generally 3 types of points

  1. Points which are local minimum
  2. Points at which if you place a drop of water, that drop will fall with certainty to a single minimum
  3. Points at which the drop is equally likely to fall to more than 1 such minimum

Now, for a particular minimum, the set of points that satisfy the 2nd condition is called the catchment basin or watershed of that minimum. While those satisfying the 3rd condition are termed as divide lines or watershed lines. And this is the main objective of the watershed algorithm, which is to find the watershed lines. I hope you understood what is a watershed in the context of images. Now, let’s discuss how the watershed algorithm finds such lines using the analogy shown below.

Suppose a hole is punched in each local minima. Start filling each local minima(from these holes) with different colored water(labels) at a uniform rate. After some time, water from different catchment basins or valleys will start to merge. To prevent this, dams or barriers are constructed at the locations where water merges. A stage will come when all the peaks are underwater and only the tops of the dams(barriers) are visible. These dam boundaries correspond to the watershed lines. This is the philosophy behind the watershed algorithm.

But this approach produces over-segmented results due to noise or other irregularities in the image. So, to overcome this, instead of filling each local minima what we can do is only fill some of them. But how to decide which are those minimums? This is where the concept of marker comes.

Markers are the set of pixels from where the flooding will start. Intuitively, think of markers as the pixels which we are sure of belonging to the objects present in the image. Generally, the number of markers is equal to the number of objects (classes) + 1 (for background). For a better understanding, see the below image. Here, we have 4 markers (see right image), 1 for background, and 3 for coins. All the markers are represented with different pixel values. The pixels with value 0 (black) in the marker image are the unknown regions and the watershed algorithm assigns each of these pixels a class out of the 4 classes (3 coins and 1 background).

These markers can either be explicitly defined by the user or can be automatically determined using morphological operators or by other methods. Intuitively, using these markers we are telling the watershed algorithm to group points like these together.

So, the concept is simple. Start growing these markers (labeled with different colors(labels)). When they are about to meet, construct barriers at those locations. These barriers give us segmentation results. Now, let’s take an example to understand how to implement the watershed algorithm using OpenCV.

OpenCV

OpenCV provides a built-in cv2.watershed() function that performs a marker-based image segmentation using the watershed algorithm. This takes as input the image (8-bit, 3-channel) along with the markers(32-bit, single-channel) and outputs the modified marker array. The syntax is given below.

As explained above, label the markers with different pixel values (such as 1,2,3, and so on) and the unknown pixels which we are not sure of anything with a value of 0. In the output, each pixel is either set to a marker value or -1 if it belongs to the boundary. Now, let’s take an image and implement this algorithm using OpenCV-Python.

In this, we will use the distance transform method along with contours to create the markers. So, for this first we need to binarize the image. This can be done using OTSU’s as shown below.

Clearly, there are some holes present in the coins. So, to remove this we can use morphological closing as shown below.

Now, we apply the Distance Transform on the binary image. In this, for each foreground pixel, we calculate its Euclidean distance to the closest background pixel (0’s here). This can be done using the cv2.distanceTransform function as shown below.

Obviously, the bright pixels are the sure markers for each coin. To extract the sure foreground, we can use the thresholding.

Now, let’s label each of these markers with different pixel values using contours as shown below. For that first create a 32-bit, single-channel marker image of all 0’s. Then find the contours and draw them (filled with the value of contour index +1).

This will create markers for the foreground region. Next, we need to represent the marker for the background region. Here, I have manually created a marker using the cv2.circle() and with a value of len(contours)+1 as shown below.

Now our markers are ready. It’s time for the final step, apply watershed and visualize the results.

See how we are able to segment the coins. That’s all for this blog. In the next blog, we will discuss how to create these markers using the morphological operations. Hope you enjoy reading.

If you have any doubts/suggestions please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Hough Circle Transform

In the previous blog, we discuss how the Hough transform can be used to detect lines in an image. Now, in this blog, let’s extend our knowledge of Hough transform to detect circles. This is also known as Hough Circle Transform. So, let’s get started.

We know that a circle can be represented as (x-a)2 + (y-b)2 = r2 where a, b represents the circle center and r is the radius. So, we require 3 parameters (a,b,r) to completely describe the circle.

Recall from the previous blog, a line requires a 2D accumulator array as a line was defined by two parameters (r,θ). So, for a circle, we would require a 3D accumulator array. Now, let’s discuss how to fill this accumulator array.

Let’s take a simple case, where the radius r is known to us. In that case, the 3D accumulator array [a,b,r] will become 2D [a,b]. And each point in the (x, y) space will be equivalent to a circle in the (a, b) space as shown below.

Now, similar to what we did in the Hough line transform, we will first draw the circles in the ab space corresponding to each edge point. Then we will find the point of intersection (actually the local maxima in the accumulator array) which will correspond to the original circle center. Now, let’s understand this with the help of image taken from Wikipedia.

Here, for each white point in the left image, we draw the corresponding circles in the ab space (Assume that the radius is known to us). Then we find the intersection point (local maxima), see the red point in the right image. This will be the original circle center. This 2D accumulator corresponds to one row of the 3D accumulator array.

Here, we assumed that the radius is known to us. In practice, we take different radius values and repeat the above procedure. Corresponding to each r value, we will fill the 3D accumulator array. A simple question, what is the maximum plausible value that the radius can take? Obviously, for images, this can’t be greater than the diagonal length of the image.

Below is the pseudo-code for the Hough Circle Transform.

Algorithm

  • Initialize the accumulator (H[a,b,r]) to all zeros
  • Find the edge image using any edge detector
  • For r= 0 to diagonal image length
    • For each edge pixel (x,y) in the image
      • For Θ = 0 to 360
        • a = x – r*cosΘ
        • b = y – r*sinΘ
        • H[a,b,r] = H[a,b,r] +1
  • Find the [a,b,r] value(s), where H[a,b,r] is above a suitable threshold value

Problem

As you might have guessed, the above algorithm is inefficient since it involves dealing with 3D parameter space. Also, we may face a sparsity problem because only a few accumulator cells would be filled. So, to overcome this, OpenCV uses a slightly trickier method known as the Hough gradient method. So, let’s understand how that works.

OpenCV Hough Gradient Method

As is clear from the name, this method takes into account the gradient information. Earlier for each edge point, we were drawing the corresponding circles in the parameter space and thereby incrementing the accumulator cells. But now instead of drawing the full circle, we only increment the accumulator cells in the gradient direction of each edge pixel.

This algorithm consists of 2 steps:

  • First, you find all the plausible candidates for the circle centers
  • Then for each candidate center, we find the best radius

Let’s understand each step in detail. You can find the OpenCV code here.

First, the edge image is calculated using the Canny edge detector. Then the gradient information is computed using the Sobel operator for each edge pixel. Now, for each edge pixel, we increment the accumulator cells that lie in both directions of the gradient. Then you select all those accumulator cells that are both local maxima and above a certain threshold. All these accumulator cells are plausible candidates for circle centers.

Once you obtained the center candidates, now the task is to find the best radius for each candidate. Obviously, the parameter space is now reduced to 1D. To fill this 1D accumulator array, for each candidate center just calculate its distance from all edge pixels(this is exactly what radius is) and increment the corresponding accumulator cell. The best radius will be the one that is best supported (voted) by the edge pixels. Repeat this for each candidate center. Discard those centers that don’t have enough support and that are close to the previously selected center. This is what the Hough Gradient method does coarsely.

Now, let’s discuss how to perform the Hough Circle transform using OpenCV-Python.

OpenCV

OpenCV provides a built-in cv2.HoughCircles() function that finds circles in a grayscale image using the Hough transform. Below is the syntax

Below are the parameters explained in detail

  • image: 8-bit, single-channel, grayscale input image
  • method: HOUGH_GRADIENT and HOUGH_GRADIENT_ALT
  • dp: The inverse ratio of accumulator resolution and image resolution
  • minDist: Minimum distance between the centers of the detected circles. All the candidates below this distance are neglected as explained above
  • param1: it is the higher threshold of the two passed to the Canny edge detector (the lower canny threshold is twice smaller)
  • param2: it is the accumulator threshold for the circle centers at the detection stage as discussed above.
  • minRadius: minimum radius that you expect. If unknown, put zero as default.
  • maxRadius: if -ve, only circle centers are returned without radius search. If unknown, put zero as default.

Image Smoothing will help in this, unless the image is already soft. Let’s take an image and see how to implement this.

Steps

  • Read an image and blur it to reduce the noise
  • Apply the Hough Circle Transform
  • Display the detected circles

Play with the parameters in order to understand them better. That’s all for this blog. In the next blog, we will discuss the Probabilistic Hough Transform that is an optimization of the Hough Transform. Hope you enjoy reading.

If you have any doubts/suggestions please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Hough Line Transform

Shape detection is an important area in the field of image processing. In the previous blog, we discussed how we can perform simple shape detection using contours. Since edge detection was used as a pre-processing step, so that approach was more susceptible to noise and missing edge points. To overcome this, in this blog, we will discuss Hough Transform, a popular method for detecting simple shapes that give robust shape detection under noise and partial occlusion. So, in this blog, let’s take an example of line detection and see how can we use the Hough transform to detect lines in an image. This method of line detection is also known as Hough Line Transform. So, let’s get started.

Hough Line Transform

Before going into detail, let’s first refresh some high school maths concepts that will be useful for understanding this.

We know that a line corresponds to a point in the parameter space as shown below. Why? because a line has a fixed slope and intercept value.

So, how this will help? Clearly, by changing the space, we reduced the line (consisting of many points) to a single point thus reducing storage and further computation.

Similarly, a point corresponds to a line in the parameter space as shown below. Why? because there can be infinite lines that can pass through a point and for each line we have a point in the parameter space.

Now, let’s combine the above two. Suppose you are given 2 points so can you find out where the line joining these 2 points map in the parameter space. Obviously, this would be equivalent to the intersection point as shown below.

This is what the Hough transform does. For each edge point, we draw the lines in the parameter space and then find their point of intersection (if any). The intersection points will give us the parameters (slope and intercept) of the line.

Problem

But there is one big problem with this parameter space representation. As you might have guessed that we can’t represent the vertical lines as this would require infinite m.

Solution

We use the polar or normal form of a line

{\displaystyle r=x\cos \theta +y\sin \theta }

Here, r is the perpendicular distance from the origin to the line and Θ is the angle formed by this perpendicular line with the origin as shown below.

You might ask isn’t r unbounded as this can take value from 0 to ∞. But since images are of finite size, thus r can take value from 0 to diagonal length of the image. Both r and Θ are finite and thus the above problem is resolved.

Now, let’s see what the line and point corresponds to in the (r,Θ) space.

A line in the (x,y) space still corresponds to a point in the (r, Θ) space. But a point in the (x,y) space is now equivalent to a sinusoidal curve in the (r, Θ) space as shown below.

Thus now for each edge pixel, we draw the sinusoidal curves in the (r, Θ) space and then find their point of intersection (if any).

Implementation

To find the point of intersection, Hough transform uses a voting method. In this, first, the parameter space (r, Θ) is discretized into bins as shown below. We call this an accumulator. Then for each edge pixel, we calculate the r value corresponding to each Θ value. And corresponding to each (r, Θ) value obtained, we increase the count of that (r, Θ) accumulator cell. Then find the bins with the highest value. Below is the pseudo-code of the Hough line transform algorithm.

Algorithm

  • Initialize the accumulator (H) to all zeros
  • For each edge pixel (x,y) in the image
    • For Θ = 0 to 180
      • Calculate r (r = x*cosΘ + y*sinΘ)
      • H(Θ,r) = H(Θ,r) +1
    • endFor
  • endFor
  • Find the (Θ,r) value(s), where H(Θ,r) is above a suitable threshold value.

OpenCV

OpenCV provides a built-in function cv2.HoughLines(), that finds the lines in a binary image using the above algorithm. This takes as input the binary image, the size of the accumulator, the threshold value and outputs the array of (r, Θ) values. The syntax is given below.

Below is the code for this.

Below is the result.

Limitations

  • Choosing a good accumulator size is difficult.
    • Too coarse: different lines will fall to a single bin
    • Too fine: votes will fall to the neighboring bins for points which are not exactly collinear
  • Time complexity increases exponentially wrt. parameters

That’s all for this blog. In the next blog, we will discuss how to detect some other shapes such as circle, ellipse, etc. Hope you enjoy reading.

If you have any doubts/suggestions please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Convexity Defects OpenCV

In the previous blog, we discussed how to find the convex hull for a particular shape. In this blog, let’s extend our convex hull knowledge to understand another important function known as Convexity defects. You will find its use in many applications such as hand gesture recognition etc. So, let’s get started.

What are Convexity Defects?

Any deviation of the contour from its convex hull is known as the convexity defect. Let’s understand this with the help of the below image.

Here, the red line shows the convex hull, the grey line represents the contour and the black arrow shows the deviation of the hull from the contour (convexity defect). So, it’s obvious that the convex curve has no convexity defects. Now, let’s discuss how to find the convexity defects using OpenCV-Python.

OpenCV

OpenCV provides a function cv2.convexityDefects() for finding the convexity defects of a contour. This takes as input the contour and its corresponding hull indices and returns an array containing the convexity defects as output. The basic syntax is given below.

The output contains an array where each row contains 4 values [start point, endpoint, farthest point, approximate distance to the farthest point]. Note that the first three values returned are indices of contour.

Now, let’s understand the output returned by the cv2.convexityDefects() function in more detail using the below image. Here, the green line represents the hull while the red line shows the corresponding contour. Clearly, there are convexity defects.

OpenCV algorithm for finding Convexity Defects

  • for each pair (H[i], H[i+1]) of adjacent hull points
    • for each contour point (C[n]) that lies between [H[i], H[i+1])
      • Find the point with the maximum distance from the line joining (H[i], H[i+1])
    • store the index of (H[i], H[i+1], max distance point) and the max distance in an array
  • Return the array

Now, let’s take a part of the above image and understand how the above algorithm works. Suppose there are 3 contour points between the first pair of hull points (H[0], H[1]).

OpenCV calculates the distance (d1, d2, d3) as shown below.

  • Calculate the unit vector for (H[0], H[1]) and rotate it by 90 degrees
  • Then project the vectors formed by joining the contour points (C[1], C[2], C[3]) with H[0] onto the unit vector as shown below. The projection is equal to the distances (d1, d2, d3).

Similarly, repeat for each pair of hull points. The full code can be found here. Now, let’s visualize the convexity defects using OpenCV-Python and the image as shown below.

Steps

  • Read an image and find its contours. This can be done as
Image Contour

Now, find the convex hull for the particular contour as shown below. Remember to pass the returnPoints flag as False. This will give us the indices of the contour points that make the hull.

Convex Hull

After that, find the convexity defects as shown below. This will give us the array as discussed above.

Since the defects array contains the indices of the contour points so, in order to visualize the defects, we need to convert them to coordinates. Here, we iterate over the array rows and draw a line joining the start point and end point, then draw a circle at the farthest point. This is done as below.

Convexity Defects

I hope you understood the convexity defects and how OpenCV calculates these defects. Hope you enjoy reading.

If you have any doubts/suggestions please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Perspective Transformation

In this blog, we will discuss what is perspective transformation and how to perform this transformation using OpenCV-Python. So, let’s get started.

What is Perspective Transformation?

As clear from the name, the perspective transformation is associated with the change in the viewpoint. This type of transformation does not preserve parallelism, length, and angle. But they do preserve collinearity and incidence. This means that the straight lines will remain straight even after the transformation.

In general, the perspective transformation can be expressed as

Here, (x’, y’) are the transformed points while (x, y) are the input points. The transformation matrix (M) can be seen as a combination of

For affine transformation, the projection vector is equal to 0. Thus, affine transformation can be considered as a particular case of perspective transformation.

Since the transformation matrix (M) is defined by 8 constants(degree of freedom), thus to find this matrix we first select 4 points in the input image and map these 4 points to the desired locations in the unknown output image according to the use-case (This way we will have 8 equations and 8 unknowns and that can be easily solved).

Once the transformation matrix is calculated, then we apply the perspective transformation to the entire input image to get the final transformed image. Let’s see how to do this using OpenCV-Python.

OpenCV

OpenCV provides a function cv2.getPerspectiveTransform() that takes as input the 4 pairs of corresponding points and outputs the transformation matrix. The basic syntax is shown below.

Once the transformation matrix (M) is calculated, pass it to the cv2.warpPerspective() function that applies the perspective transformation to an image. The syntax of this function is given below.

Now, let’s take a very classic example of perspective transform to obtain a top-down, “birds-eye view” of an image. Let’s understand step by step how to perform perspective transform using the below image.

Below is the image showing the basic idea which we will perform.

First, we will select the representative points (usually the corner points) for our region of interest (ROI). This can be done manually using matplotlib as shown below.

The above code opens up an interactive window. Use the mouse pointer to obtain the 4 corner coordinates which are shown below. Points are ordered anti-clockwise as shown in the above figure.

Then we will map these 4 points to the desired locations in the unknown output image according to the use-case. Here, since we used the corner points so we will derive the width and height of the output image from these 4 points as shown below (You can also manually specify the mapping). Below I have used the L2 norm. You can use L1 also.

Now specify the mapping as shown in the image below (same image as above).

Now, compute the transformation matrix (M) using the cv2.getPerspectiveTransform() function as shown below

After calculating the transformation matrix, apply the perspective transformation to the entire input image to get the final transformed image.

Below is the transformed image.

That’s all for this blog. In the next blog, we will discuss how to automatically choose the 4 points using the contours. Hope you enjoy reading.

If you have any doubts/suggestions please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Affine Transformation

In this blog, we will discuss what is affine transformation and how to perform this transformation using OpenCV-Python. So, let’s get started.

What is an Affine Transformation?

An affine transformation is any transformation that preserves collinearity, parallelism as well as the ratio of distances between the points (e.g. midpoint of a line remains the midpoint after transformation). It doesn’t necessarily preserve distances and angles.

Thus all the geometric transformations we discussed so far such as translation, rotation, scaling, etc are all affine transformations as all the above properties are preserved in these transformations. To understand in simple terms, one can think of the affine transformation as a composition of rotation, translation, scaling, and shear.

In general, the affine transformation can be expressed in the form of a linear transformation followed by a vector addition as shown below

Since the transformation matrix (M) is defined by 6 (2×3 matrix as shown above) constants, thus to find this matrix we first select 3 points in the input image and map these 3 points to the desired locations in the unknown output image according to the use-case as shown below (This way we will have 6 equations and 6 unknowns and that can be easily solved).

For instance, if you want to take the mirror image, you can define the 3 points as (you may choose any 3).

Once the transformation matrix is calculated, then we apply the affine transformation to the entire input image to get the final transformed image. Let’s see how to do this using OpenCV-Python.

OpenCV

OpenCV provides a function cv2.getAffineTransform() that takes as input the three pairs of corresponding points and outputs the transformation matrix. The basic syntax is shown below.

Once the transformation matrix (M) is calculated, pass it to the cv2.warpAffine() function that applies an affine transformation to an image. The syntax of this function is given below.

Now, let’s take the above example of a mirror image and see how to apply affine transformation using OpenCV-Python. Below are the steps.

  • Read the image
  • Define the 3 pairs of corresponding points (See image above)
  • Calculate the transformation matrix using cv2.getAffineTransform()
  • Apply the affine transformation using cv2.warpAffine()

Below is the output image. Here, left image represents the original image while the right one is the transformed mirror image.

Now define the 3 different point pairs and see how the transformation looks like. That’s all for this blog. In the next blog, we will discuss Perspective transformation. Hope you enjoy reading.

If you have any doubts/suggestions please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

OpenCV Minimum Area Rectangle

In the previous blog, we discussed image moments and how different contour features such as area, centroid, etc. can be extracted from them. In this blog, we will learn how to draw a minimum area rectangle around the region of interest. This is one of the most common tasks performed when working with contours. So, let’s get started.

Minimum Area Rectangle

As is clear from the name, the bounding rectangle is drawn with a minimum area. Because of this, rotation is also considered. The below image shows 2 rectangles, the green one is the normal bounding rectangle while the red one is the minimum area rectangle. See how the red rectangle is rotated.

Source: OpenCV

OpenCV provides a function cv2.minAreaRect() for finding the minimum area rotated rectangle. This takes as input a 2D point set and returns a Box2D structure which contains the following details – (center(x, y), (width, height), angle of rotation). The syntax is given below.

But to draw a rectangle, we need 4 corners of the rectangle. So, to convert the Box2D structure to 4 corner points, OpenCV provides another function cv2.boxPoints(). This takes as input the Box2D structure and returns the 4 corner points. The 4 corner points are ordered clockwise starting from the point with the highest y. The syntax is given below.

Before drawing the rectangle, you need to convert the 4 corners to the integer type. You can use np.int32 or np.int64 (Don’t use np.int8 as it permits value up to 127 and leads to truncation after that). Sometimes, you might see np.int0 used, don’t get confused, this is equivalent to np.int32 or np.int64 depending upon your system architecture. The full code is given below.

Once the 4 coordinates are obtained, you can easily draw the rectangle. Now, let’s discuss about the angle of rotation.

Angle of Rotation

As we already discussed that the 4 corner points are ordered clockwise starting from the point with the highest y as shown below. If 2 points have the same highest y, then the rightmost point is the starting point. The points are numbered as 0,1,2,3 (0-starting, 3-end).

So, the angle of rotation given by OpenCV’s cv2.minAreaRect() is actually the angle between the line (joining the starting and endpoint) and the horizontal as shown below.

Thus the angle value always lies between [-90,0). Why? because if the object is rotated more than 90 degrees, then the next edge is used to calculate the angle from the horizontal. And thus the calculated angle always lies between [-90,0). See the below image where the green line shows the line joining the starting and endpoint that is used to calculate the angle. Also, see how the starting and endpoint change when the object is rotated. The points are numbered as 0,1,2,3 (0-starting, 3-end).

The actual angle is the angle by which the object is rotated and the calculated angle is the angle returned by the cv2.minAreaRect(). I hope you understood how the angle is calculated and why it lies between [-90,0).

In the next blog, we will discuss how to deskew the region of interest using the angle obtained by the cv2.minAreaRect(). Hope you enjoy reading.

If you have any doubts/suggestions please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.