Category Archives: Image Processing

Dilation

In the previous blog, we discussed erosion operation. In this blog, we will cover another morphological operation – Dilation which is the dual of erosion. Dual in the sense that dilating the object region is equivalent to eroding the background region and vice versa. So, let’s get started.

Dilation

As clear from the name, this operation dilates or expands the object region. This is just opposite of erosion. In this, we ask the simple question of whether the structuring element hits the object or not? (“hits” here means that at least one of the image pixels underlying the structuring element (SE) should have the same value as that of the corresponding SE). If it hits, that pixel is set to 1 else set to 0. This is all the concept behind the dilation operation. Now, let’s formulate this in terms of the set operation.

In general, the dilation of the binary image A by some SE B is defined as

That is the set of all values of z such that the intersection of B (translated by z and reflected about its origin) and A is non-empty. In other words, we place the SE over the image so that the origin of the SE coincides with the input pixel position and compare the underlying image pixels with the pixels of the corresponding SE. If the SE shares at least one common element with its underlying image pixels, then the central image pixel is set to 1 else 0.

Thus this increases the size of the object. If some holes or pepper noise is present in the object, this results in bridging the gaps or removing the noise similar to what we discussed in the low pass filtering. The extent of thickening is controlled by the shape and size of the SE.

For binary images, this can be simply done by taking the maximum of the neighborhood defined by the SE. Now, let’s see how to do this using OpenCV-Python. OpenCV provides a builtin function for this as shown below.

Here, src is the input image with any number of channels( all will be processed independently) and the kernel is the structuring element whose origin is defined by the anchor (default (-1,-1) i.e at the center of the SE). You can create the SE using cv2.getStructuringElement() or simply using numpy. Iterations specify how many times to repeat the dilation process. It is sometimes useful to pad the image to account for the boundary pixels or if the image is of non-regular shape and this can be done using the “borderType” and “borderValue” arguments. Below is an example where we dilate the image with the rectangular SE.

Similar to erosion, this can also be used to remove noise, detect the object boundary, etc. Although neither erosion nor dilation alone is effective in reducing noise. A more efficient approach is erosion followed by dilation or opening operation in general. Most of the morphological algorithms which we will discuss in the next blogs are also based on dilation and erosion. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Erosion

In the previous blog, we touched on the introduction of morphological operations. In this blog and the next blog, we will discuss two of the most fundamental morphological operations – Erosion and Dilation. All other morphological operations can be defined in terms of these two basic operations. So, let’s get started.

Erosion

As clear from the name, this operation erodes or remove the pixels from the object boundary. In this, we ask the simple question of whether the structuring element fits the object or not? (“Fits” here means that all the image pixels underlying the structuring element (SE) should have the same value as that of the corresponding SE) If the image pixel fits, it is assigned 1, otherwise eroded (assigned 0). Thus if we use a square SE (say of size 3×3), then all the object boundary pixels will be eroded away. Now, let’s understand this in terms of the set operation.

In general, the erosion of the binary image A by some SE B is defined as

That is the set of all values of z such that B translated by z, is the subset of A or is contained in A. In other words, you shift the SE over the image and set the positions, where the SE doesn’t share any common element with the background, to 1 and erode all the remaining positions.

Thus this results in a decrease in the object area. If some holes are present in the object, this operation tends to increase the hole area. For binary images, this can be simply applied by taking the minimum of the neighborhood defined by the SE. Now, let’s see how to do this using OpenCV-Python. OpenCV provides a builtin function for this as shown below.

Here, src is the input image with any number of channels( all will be processed independently) and the kernel is the structuring element whose origin is defined by the anchor (default (-1,-1)). You can create the SE using cv2.getStructuringElement() or simply using numpy. Iterations specify how many times to repeat the erosion process. It is sometimes useful to pad the image to account for the boundary pixels or if the image is of non-regular shape and this can be done using the “borderType” and “borderValue” arguments. Below is an example where we erode the image with the rectangular SE.

Different structuring elements, whether in the terms of shape or size, will produce different results. Mostly people prefer disc-shaped structuring element. If the size of the SE exceeds the size of the object, then the entire object will be eroded away. Erosion can be useful in removing noise (subjected to some conditions), detecting the object boundaries (subtracting the eroded image from the original one), separating the connected components or structures of a certain shape or size, etc.

In the next blog, we will discuss another morphological operation known as Dilation in greater detail. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Morphological Image Processing

In the previous blogs, we discussed various thresholding algorithms like otsu, adaptive, BHT, etc. All these resulted in a binary image which in general are distorted by noise, holes, etc. Thus there is a need to process these images so as to remove the imperfections. And sometimes, we also need to extract image features such as boundaries, etc that are useful in the representation of the object of interest. This all is done using Morphological image processing that applies non-linear transformations based on the image shape. Now, let’s discuss why the name morphology?

In general, morphology stands for the study of the form and the structure of the things. Because as a result of conversion to the binary image, we have lost the intensity information. Thus the only information that remains is the spatial location or the structure of the image. That’s why known as morphological image processing.

Morphological image processing was originally developed for binary images but later this was also extended to the grayscale images. The watershed algorithm is an outcome of this generalization.

Let’s understand the concept behind the morphological image processing (MIP) wrt. convolution operation that we studied earlier.

Remember in convolution, we have a filter/window and we move this filter over the image. The output values are then computed by some linear operation between the filter weights and the pixel values. Similarly, in MIP we have a structuring element and we move this over the entire image. The output values are computed by applying non-linear operations like set operations (intersection, union, etc.) between the structuring element and the underlying pixel values. These non-linear operations are known as morphological operations.

So, the above paragraph contains two terms – structuring element and the morphological operations. So, let’s understand what’s a structuring element?

The structuring element is a binary image (consisting of 0’s and 1’s) that is used to probe an image for finding the region of interest. For instance, if we want to detect lines in an image, we create a linear structuring element. The pattern of 1’s and 0’s specifies the shape of the structuring element. Below is an example of elliptical SE.

Mostly the dimensions of the SE are odd with the origin at the center. OpenCV provides a builtin function for creating SE as shown below.

Here, shape refers to the SE shape. This can take one of the following values

  • cv2.MORPH_RECT – creates a rectangular SE.
  • cv2.MORPH_ELLIPSE – creates an elliptical SE.
  • cv2.MORPH_CROSS – cross-shaped SE.

The “ksize” specifies the size of the SE and anchor specifies the origin position (default is (-1,-1) i.e at the center of the SE). Below is an example that creates an elliptical SE.

To obtain the output, morphological operations are performed between the SE and the underlying pixel values. Morphological operations are nothing but basic set operations like union, intersection, etc. For instance, an example of morphological operations can be the set of all values such that at least one of the SE pixel values is equal to the underlying image pixel values. This operation usually leads to an increase in the size of the object and fills the holes if present in the object. Below figure shows this morphological operation.

If you use other SE, the result would be different. So, select the shape of the SE according to your application. In the next blog, we will discuss various morphological operations in detail. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Adaptive Thresholding

In the previous blog, we discussed how global thresholding can be a tedious task when dealing with images having non-uniform illumination. This is because you need to ensure that while subdividing an image, each sub-image histogram is bimodal. Otherwise, the segmentation task will fail.

In this blog, we will discuss adaptive thresholding that works well for varying conditions like non-uniform illumination, etc. In this, the threshold value is calculated separately for each pixel using some statistics obtained from its neighborhood. This way we will get different thresholds for different image regions and thus tackles the problem of varying illumination.

The whole procedure can be summed up as:

  • For each pixel in the image
    • Calculate the statistics (such as mean, median, etc.) from its neighborhood. This will be the threshold value for that pixel.
    • Compare the pixel value with this threshold

Now, let’s discuss the OpenCV function for adaptive thresholding.

  • src: 8-bit greyscale image
  • thresholdType: This tells us what value to assign to pixels greater/less than the threshold. Must be either THRESH_BINARY or THRESH_BINARY_INV. (You can read more about it here).
  • maxValue: This is the value assigned to the pixels after thresholding. This depends on the thresholding type. If the type is cv2.THRESH_BINARY, all the pixels greater than the threshold are assigned this maxValue.
  • adaptiveMethod: This tells us how the threshold is calculated from the pixel neighborhood. This currently supports two methods:
    • cv2.ADAPTIVE_THRESH_MEAN_C: In this, the threshold value is the mean of the neighborhood area.
    • cv2.ADAPTIVE_THRESH_GAUSSIAN_C: In this, the threshold value is the weighted sum of the neighborhood area. This uses Gaussian weights computed using getGaussiankernel() method. You can read more about it here.
  • blockSize: This is the neighborhood size.
  • C: a constant which is subtracted from the threshold.

As discussed OpenCV only provides mean and weighted mean to serve as the threshold. But don’t limit yourself to these two statistics. Try other statistics like standard deviation, median, etc. by writing your own helper function. Let’s see how to use this.

See how effective adaptive thresholding is in the case of non-uniform illumination. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Balanced histogram thresholding

In the previous blogs, we discussed different methods for automatically finding the global threshold for an image. For instance, the iterative method, Otsu’s method, etc. In this blog, we will discuss another very simple approach for automatic thresholding – Balanced histogram thresholding. As clear from the name, this method tries to automatically find the threshold by balancing the image histogram. Let’s understand this method in detail.

Note: This method assumes that the image histogram is bimodal and a reasonable contrast ratio exists between the background and the region of interest.

Concept

Suppose you have a perfectly balanced histogram i.e. a histogram where the distribution of the background and the roi is the same. If you place such a histogram over the lever, it will be balanced. And the optimum threshold will be at the center of the lever as shown in the figure below

Source: BHT

This is the main idea behind the Balanced Histogram Thresholding. This method tries to balance the image histogram and then infer the threshold value from that.

But in real-life situations, we don’t encounter images with such perfectly balanced histograms. So, let’s see how this method balances the unbalanced histograms.

  • First, it places the histogram over the lever and calculates the center point.
  • Then this calculates the left side and right side weights from the center point.
  • Removes weight from the heavier side and adjust the center.
  • Repeat the above two steps until the starting and the endpoints are equal to the center.

The whole procedure can be summed up in the below gif (taken from Wikipedia)

Below is the python code for this. Here, i_s, i_e are the starting and the endpoints of the histogram and i_m is the center

The above function takes the image histogram as the input and returns the optimum threshold. Let’s take an example to check how this works.

Below is the histogram of the image constructed.

Now, let’s apply the Balanced Histogram thresholding method to check what threshold value this outputs.

87 looks like a reasonable threshold, check the image histogram above. So, that’s all for this time. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Optimum Global Thresholding using Otsu’s Method

In the previous blog, we discussed global thresholding and how to find the global threshold using the iterative approach. In this blog, we will discuss Otsu’s method, named after Nobuyuki Otsu, that automatically finds the global threshold. So, let’s discuss this method in detail.

Note: This method assumes that the image histogram is bimodal and a reasonable contrast ratio exists between the background and the region of interest.

In simple terms, Otsu’s method tries to find a threshold value which minimizes the weighted within-class variance. Since Variance is the spread of the distribution about the mean. Thus, minimizing the within-class variance will tend to make the classes compact.

Let’s say we threshold a histogram at a value “t”. This produces two regions – left and right of “t” whose variance is given by σ20 and σ21. Then the weighted within-class variance is given by

where w0(t) and w1(t) are the weights given to each class. Weights are total pixels in a thresholded region (left or right) divided by the total image pixels. Let’s take a simple example to understand how to calculate these.

Suppose we have the following histogram and we want to find the weighted within-class variance corresponding to threshold value 1.

Below are the weights and the variances calculated for left and the right regions obtained after thresholding at value 1.

Similarly, we will iterate over all the possible threshold values, calculate the weighted within-class variance for each of the thresholds. The optimum threshold will be the one with the minimum within-class variance.

Now, let’s see how to do this using python.

The image histogram is shown below

Now, let’s calculate the within-class variance using the steps which we discussed earlier.

The gif below shows how the within-class variance (blue dots) varies with the threshold value for the above histogram. The optimum threshold value is the one where the within-class variance is minimum.

OpenCV also provides a builtin function to calculate the threshold using this method.

OpenCV

You just need to pass an extra flag, cv2.THRESH_OTSU in the cv2.threshold() function which we discussed in the previous blog. The optimum threshold value will be returned by this along with the thresholded image. Let’s see how to use this.

A Faster Approach

We all know that minimizing within-class variance is equivalent to maximizing between-class variance. This maximization operation can be implemented recursively and is faster than the earlier method. The expression for between-class variance is given by

Below are the steps to calculate recursively between-class variance.

  1. Calculate the histogram of the image.
  2. Set up weights and means corresponding to the “0” threshold value.
  3. Loop through all the threshold values
    1. Update the weights and the mean
    2. Calculate the between-class variance
  4. The optimum threshold will be the one with the max variance.

Below is the code in Python that implements the above steps.

This is how you can implement otsu’s method recursively if you consider maximizing between-class variance. Now, let’s discuss what are the limitations of this method.

Limitations

Otsu’s method is only guaranteed to work when

  • The histogram should be bimodal.
  • Reasonable contrast ratio exists between the background and the roi.
  • Uniform lighting conditions are there.
  • Image is not affected by noise.
  • Size of the background and the roi should be comparable.

There are many modifications done to the original Otsu’s algorithm to address these limitations such as two-dimensional Otsu’s method etc. We will discuss some of these modifications in the following blogs.

In the following blogs, we will also discuss how to counter these limitations so as to get satisfactory results with otsu’s method. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Improving Global Thresholding

In the previous blog, we discussed otsu’s method for automatic image thresholding. Then we also discussed the limitations of the otsu’s method. In this blog, we will discuss how to handle these limitations so as to produce satisfactory thresholding results. So, let’s get started.

Case-1: When the noise is present in the image

If the noise is present in the image, then this tends to change the modality of the histogram. The sharp valleys between the peaks of the bimodal histogram start degrading. In that case, the otsu’s method or any other global thresholding method will fail. So, in order to find the global threshold, one should first remove the noise using any smoothing filters like Gaussian, etc. and then apply any automatic thresholding method like otsu, etc.

Case-2: When the object area is small compared to the background area

In this case, the image histogram will be dominated by a large background area. This will increase the probability of any pixel belonging to the background. So, the histogram will no longer exhibit bimodality and thus otsu will result in segmentation error. To prevent this, one should only consider pixels that lie on or near the edges between the objects and the background. Doing so will result in an image histogram with peaks of approximately the same size. Then we can apply any automatic thresholding method like otsu, etc. Below are the steps to implement the above procedure.

  • Calculate the edge image using any high pass filter like Sobel, Laplacian, etc.
  • Select any threshold value (T).
  • Threshold the above edge image to produce a binary mask.
  • Apply the mask image on the input image using any bitwise operations or any other method.
  • This results in only those pixels where the mask image was white.
  • Compute the histogram of only those pixels
  • Finally, apply any automatic global thresholding method like otsu, etc.

Case-3: When the image is taken under non-uniform illumination conditions

In this case, the histogram no longer remains bimodal and thus we will not be able to segment the image satisfactorily. One of the simplest approaches is to subdivide the image into non-overlapping images/rectangles. The size of these rectangles is chosen such that the illumination is nearly constant in each of these rectangles. Then we will apply any global thresholding technique like otsu for each of these rectangles.

The above procedure only works when the size of the object and the background are comparable in the rectangle. This is quite intuitive as only then we will have a bimodal histogram. Taking care of the background and the object sizes in each rectangle is a tedious task.

So, in the next blog, we will discuss adaptive thresholding that works pretty well for the above conditions. That’s all for this blog. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Global Thresholding

In the previous blog, we discussed image thresholding and when to use this for image segmentation. We also learned that thresholding can be global or adaptive depending upon how the threshold value is selected.

In this blog, we will discuss

  • global thresholding
  • OpenCV function for global thresholding
  • How to choose threshold value using the iterative algorithm

In global thresholding, each pixel value in the image is compared with a single (global) threshold value. Below is the code for this.

Here, we assign a value of “val_high” to all the pixels greater than the threshold otherwise “val_low“. OpenCV also provides a builtin function for thresholding the image. So, let’s take a look at that function.

OpenCV

This function returns the thresholded image(dst) and the threshold value(retval). Its arguments are

  • src: input greyscale image (8-bit or 32-bit floating point)
  • thresh: global threshold value
  • type: Different types that decide “val_high” and “val_low“. In other words, these types decide what value to assign for pixels greater than and less than the threshold. Below figure shows different thresholding types available.
  • maxval: maximum value to be used with THRESH_BINARY and THRESH_BINARY_INV. Check the below image.

To specify the thresholding type, write “cv2.” as the prefix. For instance, write cv2.THRESH_BINARY if you want to use this type. Let’s take an example

Similarly, you can apply other thresholding types to check how they work. Till now we discussed how to threshold an image using a global threshold value. But we didn’t discuss how to get this threshold value. So, in the next section, let’s discuss this.

How to choose the threshold value?

As already discussed, that global thresholding is a suitable approach only when intensity distributions of the background and the ROI are sufficiently distinct. In other words, there is a clear valley between the peaks of the histogram. We can easily select the threshold value in that situation. But what if we have a number of images. In that case, we don’t manually want to first check the image histogram and then deciding the threshold value. We want something that can automatically estimate the threshold value for each image. Below is the algorithm that can be used for this purpose.

Source: Wikipedia

Below is the code for the above algorithm.

Now, let’s take an example to check how’s this working.

That’s all for this blog. In the next blog, we will discuss how to perform optimum global thresholding using Otsu’s method. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Image Thresholding

Image Segmentation is the process of subdividing an image into its constituent regions or objects. In many computer vision applications, image segmentation is very useful to detect the region of interest. For instance, in medical imaging where we have to locate tumors, or in object detection like self-driving cars have to detect pedestrians, traffic signals, etc or for video surveillance, etc. There are a number of methods available to perform image segmentation. For instance, thresholding, clustering methods, graph partitioning methods, and convolutional methods to mention a few.

In this blog, we will discuss Image Thresholding which is one of the simplest methods for image segmentation. In this, we partition the images directly into regions based on the intensity values. So, let’s discuss image thresholding in greater detail.

Concept

If the pixel value is greater than a threshold value, it is assigned one value (maybe white), else it is assigned another value (maybe black).

In other words, if f(x,y) is the input image then the segmented image g(x,y) is given by

If the threshold value T remains constant over the entire image, then this is known as global thresholding. When the value of T changes over the entire image or depends upon the pixel neighborhood, then this is known as adaptive thresholding. We will cover both these types in greater detail in the following blogs.

Applicability Condition

Thresholding is only guaranteed to work when a good contrast ratio between the region of interest and the background exists. Otherwise, the thresholding will not be able to fully detect the region of interest. Let’s understand this by an example.

Suppose we have two images from which we want to segment the square region (our region of interest) from the background.

Let’s plot the histogram of these two images.

Clearly as expected for “A“, the histogram is showing two peaks corresponding to the square and the background. The separation between the peaks shows that the background and ROI have a good contrast ratio. By choosing a threshold value between the peaks, we will be able to segment out the ROI. While for “B”, the intensity distribution of the ROI and the background is not that distinct. Thus we may not be able to fully segment the ROI.

Thresholded images are shown below (How to choose a threshold value will be discussed in the next blog).

So, always plot the image histogram to check the contrast ratio between the background and the ROI. Only if the contrast ratio is good, choose the thresholding method for image segmentation. Otherwise, look for other methods.

In the next blog, we will discuss global thresholding and how to choose the threshold value using the iterative method. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Laplacian of Gaussian (LoG)

In the previous blog, we discuss various first-order derivative filters. In this blog, we will discuss the Laplacian of Gaussian (LoG), a second-order derivative filter. So, let’s get started

Mathematically, the Laplacian is defined as

Unlike first-order filters that detect the edges based on local maxima or minima, Laplacian detects the edges at zero crossings i.e. where the value changes from negative to positive and vice-versa.

Let’s obtain kernels for Laplacian similar to how we obtained kernels using finite difference approximations for the first-order derivative.

Adding these two kernels together we obtain the Laplacian kernel as shown below

This is called a negative Laplacian because the central peak is negative. Other variants of Laplacian can be obtained by weighing the pixels in the diagonal directions also. Make sure that the sum of all kernel elements is zero so that the filter gives zero response in the homogeneous regions.

Let’s now discuss some properties of the Laplacian

  • Unlike first-order that requires two masks for finding edges, Laplacian uses 1 mask but the edge orientation information is lost in Laplacian.
  • Laplacian gives better edge localization as compared to first-order.
  • Unlike first-order, Laplacian is an isotropic filter i.e. it produces a uniform edge magnitude for all directions.
  • Similar to first-order, Laplacian is also very sensitive to noise

To reduce the noise effect, image is first smoothed with a Gaussian filter and then we find the zero crossings using Laplacian. This two-step process is called the Laplacian of Gaussian (LoG) operation.

But this can also be performed in one step. Instead of first smoothing an image with a Gaussian kernel and then taking its Laplace, we can obtain the Laplacian of the Gaussian kernel and then convolve it with the image. This is shown below where f is the image and g is the Gaussian kernel.

Now, let’s see how to obtain LoG kernel. Mathematically, LoG can be written as

The LoG kernel weights can be sampled from the above equation for a given standard deviation, just as we did in Gaussian Blurring. Just convolve the kernel with the image to obtain the desired result, as easy as that.

Select the size of the Gaussian kernel carefully. If LoG is used with small Gaussian kernel, the result can be noisy. If you use a large Gaussian kernel, you may get poor edge localization.

Now, let’s see how to do this using OpenCV-Python

OpenCV-Python

OpenCV provides a builtin function that calculates the Laplacian of an image. You can find it here. Below is the basic syntax of what this function looks like

Steps for LoG:

  • Apply LoG on the image. This can be done in two ways:
    • First, apply Gaussian and then Laplacian or
    • Convolve the image with LoG kernel directly
  • Find the zero crossings in the image
  • Threshold the zero crossings to extract only the strong edges.

Let’s understand each step through code

Since zero crossings is a change from negative to positive and vice-versa, so an approximate way is to clip the negative values to find the zero crossings.

Another way is to check each pixel for zero crossing as shown below

Depending upon the image you may need to apply thresholding and median blurring to suppress the noise.

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.