Tag Archives: image processing

Understanding Geometric Transformation: Translation using OpenCV-Python

In this blog, we will discuss image translation one of the most basic geometric transformations, that is performed on images. So, let’s get started.

Translation is simply the shifting of object location. Suppose we have a point P(x,y) which is translated by (tx, ty), then the coordinates after translation denoted by P'(x’,y’) are given by

So, we just need to create the transformation matrix (M) and then we can translate any point as shown above. That’s the basic idea behind translation. So, let’s first discuss how to do image translation using numpy for better understanding, and then we will see a more sophisticated implementation using OpenCV.

Numpy

First, let’s create the transformation matrix (M). This can be easily done using numpy as shown below. Here, the image is translated by (100, 50)

Next, let’s convert the image coordinates to the form [x,y,1]. This can be done as

Now apply the transformation by multiplying the transformation matrix with coordinates.

Keep only the coordinates that fall within the image boundary.

Now, create a zeros image similar to the original image and project all the points onto the new image.

Display the final image.

The full code can be found below

Below is the output. Here, left image represents the original image while the right one is the translated image.

OpenCV-Python

Now, let’s discuss how to translate images using OpenCV-Python.

OpenCV provides a function cv2.warpAffine() that applies an affine transformation to an image. You just need to provide the transformation matrix (M). The basic syntax for the function is given below.

Below is a sample code where the image is translated by (100, 50).

Below is the output. Here, left image represents the original image while the right one is the translated image.

Compare the outputs of both implementations. That’s all for image translation. In the next blog, we will discuss another geometric transformation known as rotation in detail. Hope you enjoy reading.

If you have any doubts/suggestions please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Finding Convex Hull OpenCV Python

In the previous blog, we discussed how to perform simple shape detection using contour approximation. In this blog, we will discuss how to find the convex hull of a given shape/curve. So, let’s first discuss what is a convex hull?

What is a Convex Hull?

Any region/shape is said to be convex if the line joining any two points (selected from the region) is contained entirely in that region. Another way of saying this is, for a shape to be convex, all of its interior angles must be less than 180 degrees or all the vertices should open towards the center. Let’s understand this with the help of the image below.

Convex vs concave

Now, for a given shape or set of points, we can have many convex curves/boundaries. The smallest or the tight-fitting convex boundary is known as a convex hull.

Convex Hull

Now, the next question that comes to our mind is how to find the convex hull for a given shape or set of points? There are so many algorithms for finding the convex hull. Some of the most common algorithms with their associated time complexities are shown below. Here, n is the no. of input points and h is the number of points on the hull.

OpenCV provides a builtin function for finding the convex hull of a point set as shown below

  • points: any contour or Input 2D point set whose convex hull we want to find.
  • clockwise: If it is True, the output convex hull is oriented clockwise. Otherwise, counter-clockwise.
  • returnPoints: If True (default) then returns the coordinates of the hull points. Otherwise, returns the indices of contour points corresponding to the hull points. Thus to find the actual hull coordinates in the second(False) case, we need to do contour[indices].

Now, let’s take an example and understand how to find the convex hull for a given image using OpenCV-Python.

sample image for finding Convex Hull

Steps:

  • Load the image
  • Convert it to greyscale
  • Threshold the image
  • Find the contours
  • For each contour, find the convex hull and draw it.

Below is the output of the above code.

Convex Hull output

Applications:

  • Collision detection or avoidance.
  • Face Swap
  • Shape analysis and many more.

Hope you enjoy reading.

If you have any doubts/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Contour Tracing

In the previous blogs, we discussed various image segmentation methods which result in partitioning the image into sub-regions. Now, the next task is to represent and describe these regions in a form suitable further image processing tasks such as pattern classification or recognition, etc. One can represent these regions either in terms of the boundary (external feature) or in terms of the pixels comprising the regions (internal feature). So, in this blog, we will discuss one such representation known as Contours.

Contours in simple terms is a curve joining all the continuous points (along the boundary), having some similar property such as intensity. Once the contours are extracted, we can use them for shape analysis, and various object detection and recognition tasks, etc. So, let’s discuss different contour tracing (i.e. detecting the boundary of a region) algorithms. Some of the most common algorithms are

Square Tracing algorithm

This was one of the first approaches to extract contours and is quite simple. Suppose background is black (0’s) and object is white (1’s). Start iterating over the binary or segmented image row by row starting from left to right. If you detect white pixel (i.e. 1) go left otherwise go right. Here, left and right direction is subjective to how you entered that pixel. Stopping condition is if you entered the starting pixel a second time in the same manner you entered it initially. This works best with 4-connectivity as it only checks left and right and misses diagonal directions.

Moore Boundary Tracing algorithm

Start iterating row by row from left to right. Then traverse the 8-connected components of the object pixel found in the clockwise direction from the background pixel just before the object pixel. Stopping criteria is same as above. This removes the above method limitations.

Radial Sweep

This is similar to the Moore algorithm. After performing the first step of Moore algorithm, draw a line segment connecting the two object pixels found. Rotate this line segment in the clockwise direction until an object pixel is found in the 8-connectivity. Again draw the line segment and rotate. Stopping criteria is when you encounter the starting pixel, a second time, with the same next pixel. For a demonstration, please refer to this.

These are some of the few algorithms for contour tracing. In the next blog, we will discuss the Suzuki’s Algorithm one that OpenCV uses for finding and drawing contours. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

References: Wikipedia, Imageprocessingplace

Integral images

In this blog, we will discuss the concept of integral images (or summed-area table, in general) that lets us efficiently compute the statistics like mean, standard deviation, etc in any rectangular window. This was introduced in 1984 by Frank Crow but this became popular due to its use in template matching and object detection (Source). So, let’s first discuss what is an integral image then discuss why it is efficient and how to compute the statistics from the integral image.

Integral image is obtained by summing all the pixels before each pixel (Naively you can think of this as similar to the cumulative distribution function where a particular value is obtained by summing all the values before). Let’s take an example to understand this.

Suppose we have a 5×5 binary image as shown below. The integral image is shown on the right.

All the pixels in the integral image are obtained by summing all the previous pixels. Previous here means all the pixels above and to the left of that pixel (inclusive of that pixel). For instance, the 3 (blue circle) is obtained by adding that pixel with the above and left pixels in the input image i.e. 1+0+0+1+0+0+0+1 = 3.

Finding the sum of pixels

Once the integral image is obtained, the sum of pixels in any rectangular region can be obtained in constant time (O(1) time complexity) by the following expression:

Sum = Bottom right + top left – top right – bottom left

For instance, the sum of all the pixels in the rectangular window can be obtained easily from the integral image using the above expression as shown below.

Here, top right (denoted by B) is 2, not 3. Be careful as we are finding the integral sum up to that point. For the ease of visualization, we can take a 4×4 window in the integral image and then perform the sum. For boundary pixels, pad with 0’s.

Now the mean can be calculated easily by dividing the sum by total pixels in that window. The standard deviation for any window can be obtained by the following formulae. This is obtained by simply expanding the variance formulae (See Wikipedia).

Here, S1 is the sum of the rectangular region in the input image and S2 is the sum of the square of that region in the input image and n is the no. of pixels in that region. Both S1 and S2 can be found out easily using the integral image. Now, let’s discuss how to implement this using OpenCV-Python. Let’s first discuss the builtin functions provided by OpenCV to calculate the integral image.

Here, src is the input image and sdepth is the optional argument denoting the depth of the integral image (must be of type CV_32S, CV_32F, or CV_64F). This returns an integral image which is of size (W+1)x(H+1) i.e. one more than the input image. Here, the first row and column of the integral image are all 0’s to deal with the boundary pixels as explained above. Rest all the pixels are obtained by summing all the previous pixels.

OpenCV also provides a function that returns the integral image of both the input image and its square. This can be done by the following function.

Here, sqdepth is the depth of the integral of the squared image (must be of type CV_32F, or CV_64F). This returns 2 arrays representing the integral of the input image and its square.

Calculate Standard deviation

Let’s verify that the standard deviation calculated by the above formulae yields correct results. For this, we will calculate the standard deviation using the builtin cv2.meanStdDev() function and then compare the results. Below is the code for this.

Thus, calculating the integral image is a simple operation that lets us calculate the image statistics super-fast. Later we will learn how this can be very useful in template matching, face detection, etc. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Image Pyramids

Image pyramid refers to the way of representing an image at multiple resolutions. The idea behind this is that features that may go undetected at one resolution can be easily detected at some other resolution. For instance, if the region of interest is large in size, a low-resolution image or coarse view is sufficient. While for small objects, it’s beneficial to examine them at high resolution. Now, if both large and small objects are present in an image, analyzing the image at several resolutions can prove beneficial. This is the main concept behind image pyramids. The name “pyramid” because if you place the high-resolution image at the bottom and stack subsequent low-resolution images on top, the appearance resembles that of a pyramid.

Thus constructing an image pyramid is equivalent to performing repeated smoothing and subsampling (reducing the size to half) an image. This is illustrated in the image below

Source: Wikipedia

Why blurring? Because this reduces the aliasing or ringing effects that may arise if we downsample directly. Now depending upon the type of blurring applied the pyramid is named. For instance, if we apply a mean filter, the pyramid is known as the mean pyramid, Gaussian filter – Gaussian pyramid and if we don’t apply any filtering, this is known as subsampling pyramid, etc. For subsampling, we can use any interpolation algorithm such as the nearest neighbor, bilinear, bicubic, etc. In this blog, we will discuss only two kinds of image pyramids

  • Gaussian Pyramid
  • Laplacian Pyramid

Gaussian pyramid involves applying repeated Gaussian blurring and downsampling an image until some stopping criteria are met. For instance, one of the stopping criteria can be the minimum image size. OpenCV provides a builtin function to perform blurring and downsampling as shown below

Here, src is the source image and rest are optional arguments which includes the output size (dstsize) and the border type. By default, the size of the output image is computed as Size((src.cols+1)/2, (src.rows+1)/2) i.e. the size is reduced to one-fourth at each step.

This function first convolves the input image with a 5×5 Gaussian kernel and then downsamples the image by rejecting even rows and columns. Below is an example of how to implement the above function.

Now, let’s discuss the Laplace pyramid. Since Laplacian is a high pass filter, so at each level of this pyramid, we will get an edge image as an output. As we have already discussed in the edge detection blog that the Laplacian can be approximated using the difference of Gaussian. So, here we will take advantage of this fact and obtain the Laplacian pyramid by subtracting the Gaussian pyramid levels. Thus the Laplacian of a level is obtained by subtracting that level in Gaussian Pyramid and expanded version of its upper level in Gaussian Pyramid. This is illustrated in the figure below.

OpenCV also provides a function to go down the image pyramid or expand a particular level as shown in the figure above.

This upsamples the input image by injecting even zero rows and columns and then convolves the result with the 5×5 Gaussian kernel multiplied by 4. By default, output image size is computed as Size(src.cols*2, (src.rows*2). Let’s take an example to illustrate the Laplacian pyramid.

Steps:

  • First load the image
  • Then construct the Gaussian pyramid with 3 levels.
  • For the Laplacian pyramid, the topmost level remains the same as in Gaussian. The remaining levels are constructed from top to bottom by subtracting that Gaussian level from its upper expanded level.

The Laplacian pyramid is mainly used for image compression. Image pyramids can also be used for image blending and for image enhancement which we will discuss in the next blog. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Image Blending using Image Pyramids

In the previous blog, we discussed image pyramids, and how to construct a Laplacian pyramid from the Gaussian. In this blog, we will discuss how image pyramids can be used for image blending. This produces more visually appealing results as compared to different blending methods we discussed until now. Below are the steps for image blending using image pyramids.

Steps:

  1. Load the two images and the mask.
  2. Find the Gaussian pyramid for the two images and the mask.
  3. From the Gaussian pyramid, calculate the Laplacian pyramid for the two images as explained in the previous blog.
  4. Now, blend each level of the Laplacian pyramid according to the mask image of the corresponding Gaussian level.
  5. From this blended Laplacian pyramid, reconstruct the original image. This is done by expanding the level and adding it to the below level as shown in the figure below. Here LS0, LS1, LS2, and LS3 are the levels of the blended Laplacian pyramid obtained in step 4.

Now, let’s implement the above steps using OpenCV-Python. Suppose we want to blend the two images corresponding to the mask as shown below.

Mask Image

So, we will clip the jet image from the second image and blend it to the first image. Below is the code for the steps explained above.

The blended output is shown below

Still, there is some amount of white gaze around the jet. Later, we will discuss gradient-domain blending methods which improve the result even more. Now, compare this image with a simple copy and paste operation and see the difference.

You can do a side-by-side blending also. In the next blog, we will discuss how to perform image enhancement and image compression using the Laplacian pyramids. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Earth Mover’s Distance (EMD)

In the previous blogs, we discussed various histogram comparison methods for image retrieval. Most of the methods we discussed were highly sensitive to blurring, local deformation or color shifts. etc. In this blog, we will discuss a more robust method for comparing the distributions known as Wasserstein metric or Earthmover’s distance. So, let’s get started.

In a formal sense, the notion of distance is more suitable to single elements as compared to distributions. For instance, if I say what is the distance from your house to the neighbor’s house? Most of you will come up with a number say x meters but where is this distance came from. Is it the distance between the centers of both the houses or the nearest distance between the two houses or any other form. Thus, the definition of distance becomes less apparent when we are dealing with distributions or sets of elements rather than single elements. So, in this blog, we will discuss the Earthmover’s distance also known as Wasserstein metric which is more suitable for finding distance or similarity between the distributions. This concept was first introduced by Gaspard Monge in 1781, in the context of transportation theory (Wikipedia). Let’s discuss the main concept behind this.

Let’s say we have two distributions A and B whose distance we want to calculate. This assumes one distribution to be a mass of earth or a pile of dirt and the other to be a collection of holes in that same space. The least amount of work done to fill the holes completely gives us the EMD. Filling the holes result in converting one distribution to another. The lesser the distance, the more similar will be the distributions and vice-versa.

Mathematically, we construct a matrix, say M, whose each element denotes the amount of weight transferred or matched between the distributions. For instance, Mij denotes the weight transferred from the ith position in the first distribution to the jth position in the second distribution. The work done will be the weight transferred multiplied by the distance i.e. Mij*dij. Thus the EMD is given by

There is an important terminology used in EMD known as a signature which is nothing but a way of representing any distribution. For instance, in this, we divide any distribution into clusters which are represented by the mean (or any other statistics) and the fraction of the distribution in that cluster. This representation by a set of clusters is called the signature.

Now, let’s see how to implement this. OpenCV provides a builtin function for calculating EMD as shown below.

Here, the signature is a matrix of size equal to the (total number of pixels x number of dimensions + 1). For greyscale, we have 2 dimensions, width, and height while for color image 3. Thus for greyscale, each row of signature represents (pixel value and the coordinates). To calculate the distance, you can use any metric such as L1 and L2 as cv2.DIST_L1 etc. You can pass the cost matrix using the cost argument. lower-bound is the distance between the center of mass of two signatures. This outputs the flow matrix (M discussed above), lower bound and the work.

Now, let’s take an example to understand this. First, you need to change the images to their corresponding signatures as shown below.

and then calculate the EMD as shown below.

Below is the output we got

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Comparing Histograms using OpenCV-Python

In the previous blogs, we discussed a lot about histograms. We learned histogram equalization, making a histogram to match a specified histogram, back project a histogram to find regions of interest and even used a histogram for performing image thresholding. In this blog, we will learn how to compare the histograms for the notion of similarity. This comparison is possible because we can classify a number of things around us based on color. We will learn various single number evaluation metrics that tell how well two histograms match with each other. So, let’s get started.

The histogram comparison methods can be classified into two categories

  • Bin-to-Bin comparison
  • Cross-bin comparison

Bin-to-Bin comparison methods include L1, L2 norm for calculating the bin distances or bin intersection, etc. These methods assume that the histogram domains are aligned but this condition is easily violated in most of the cases due to change in lighting conditions, quantization, etc. Cross bin comparison methods are more robust and discriminative but this can be computationally expensive. To circumvent this, one can reduce the cross bin comparison to bin-to-bin. Cross bin comparison methods include Earthmoving distance (EMD), quadratic form distances (taking into account the bin similarity matrix), etc.

OpenCV provides a builtin function for comparing the histograms as shown below.

Here, H1 and H2 are the histograms we want to compare and the “method” argument specifies the comparison method. OpenCV provides several built-in methods for histogram comparison as shown below

  • HISTCMP_CORREL: Correlation
  • HISTCMP _CHISQR: Chi-Square
  • HISTCMP _CHISQR_ALT: Alternative Chi-Square
  • HISTCMP _INTERSECT: Intersection
  • HISTCMP _BHATTACHARYYA: Bhattacharyya distance
  • HISTCMP _HELLINGER: Synonym for CV_COMP_BHATTACHARYYA
  • HISTCMP _KL_DIV: Kullback-Leibler divergence

For the Correlation and Intersection methods, the higher the metric, the more accurate the match. While for chi-square and Bhattacharyya, the lower metric value represents a more accurate match. Now, let’s take an example to understand how to use this function. Here, we will compare the two images as shown below.

Steps:

  • Load the images
  • Convert it into any suitable color model
  • Calculate the image histogram (2D or 3D histograms are better) and normalize it
  • Compare the histograms using the above function

The metric value comes out to be around 0.99 which seems to be pretty good. Try changing the bin sizes and the comparison methods and observe the change. In the next blog, we will discuss Earthmoving distance (EMD), a cross bin comparison method that is more robust as compared to these methods. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Add borders to the image using OpenCV-Python

In this blog, we will learn how to add different borders to the image using OpenCV-Python. Adding border doesn’t only make the image looks stylish but this is also useful in many image processing tasks such as image interpolation, morphological operations, edge detection, etc. OpenCV provides different border styles and in this blog, we will explore these. Below is the inbuilt function provided by OpenCV for this.

Here, src is the input image and top, left, right, and bottom specifies how many pixels to add in each direction. “borderType” specifies what type of border to add. Below are the types available in OpenCV.

  • cv2.BORDER_REFLECT: this reflects the border elements such as fedcba|abcdefgh|hgfedcb
  • cv2.BORDER_REFLECT_101: this reflects leaving the border pixel such as gfedcb|abcdefgh|gfedcba
  • cv2.BORDER_REPLICATE: Border pixel will be replicated such as aaaaaa|abcdefgh|hhhhhhh
  • cv2.BORDER_WRAP: this reflects the pixel from the opposite boundary as cdefgh|abcdefgh|abcdefg
  • cv2.BORDER_CONSTANT: this adds a constant border whose value is given by the “value” argument.

Now, let’s take an example to illustrate this. Here, I have created a trackbar that lets us understand this clearly.

Play around with these trackbars to get a feeling of different border types. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Thresholding using cv2.inRange() function

In the previous blogs, we discussed various thresholding methods such as Otsu, adaptive, BHT, etc. In this blog, we will learn how to segment out a particular region or color from an image. This is naively equivalent to multiple thresholding where we assign a particular value to the region falling in between the two thresholds. Remaining region is assigned a different value. OpenCV provides an inbuilt function for this as shown below

Here, src is the input image. ‘lowerb’ and ‘upperb’ denotes the lower and upper boundary of the threshold region. A pixel is set to 255 if it lies within the boundaries specified otherwise set to 0. This way it returns the thresholded image.

A nice way to understand any method is to play with the arguments and for that, trackbars come very handily. Let’s segment the image based on the color as any color (and its shades) mostly covers some range of intensity values. Thus for segmentation any color this function will be very useful. Below is the code where I have created trackbars to segment any color in a live webcam feed.

Play around with the trackbars to get a feel of cv2.inRange function. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.