Author Archives: kang & atul

Contrast Stretching

In the previous blog, we discussed the meaning of contrast in image processing, how to identify low and high contrast images and at last, we discussed the cause of low contrast in an image. In this blog, we will learn about the methods of contrast enhancement.

Below figure summarizes the Contrast Enhancement process pretty well.

Source: OpenCV

Depending upon the transformation function used, Contrast Enhancement methods can be divided into Linear and Non-Linear.

The linear method includes Contrast-Stretching transformation that uses Piecewise Linear functions while Non-linear method includes Histogram Equilisation, Gaussian Stretch etc. which uses Non-Linear transformation functions that are obtained automatically from the histogram of the input image.

In this blog, we will discuss only the Linear methods. Rest we will discuss in the next blogs.

Contrast stretching as the name suggests is an image enhancement technique that tries to improve the contrast by stretching the intensity values of an image to fill the entire dynamic range. The transformation function used is always linear and monotonically increasing.

Below figure shows a typical transformation function used for Contrast Stretching.

By changing the location of points (r1, s1) and (r2, s2), we can control the shape of the transformation function. For example,

  1. When r1 =s1 and r2=s2, transformation becomes a Linear function.
  2. When r1=r2, s1=0 and s2=L-1, transformation becomes a thresholding function.
  3. When (r1, s1) = (rmin, 0) and (r2, s2) = (rmax, L-1), this is known as Min-Max Stretching.
  4. When (r1, s1) = (rmin + c, 0) and (r2, s2) = (rmax – c, L-1), this is known as Percentile Stretching.

Let’s understand Min-Max and Percentile Stretching in detail.

In Min-Max Stretching, the lower and upper values of the input image are made to span the full dynamic range. In other words, Lower value of the input image is mapped to 0 and the upper value is mapped to 255. All other intermediate values are reassigned new intensity values according to the following formulae

Sometimes, when Min-Max is performed, the tail ends of the histogram becomes long resulting in no improvement in the image quality. So, it is better to clip a certain percentage like 1%, 2% of the data from the tail ends of the input image histogram. This is known as Percentile Stretching. The formulae is same as Min-Max but now the Xmax and Xmin are the clipped values.

Let’s understand Min-Max and Percentile Stretching with an example. Suppose we have an image whose histogram looks like this

Clearly, this histogram has a left tail with few values(around 70 to 120). So, when we apply Min-max Stretching, the result looks like this

Clearly, Min-Max stretching doesn’t improve the results much. Now, let’s apply Percentile Stretching

As we clipped the long tail of input histogram, Percentile stretching produces much superior results than the Min-max stretching.

Let’s see how to perform Min-Max Stretching using OpenCV-Python

For a color image, either change it into greyscale and then apply contrast stretching or change it into another color model like HSV and then apply contrast stretching on V. For percentile stretching, just change the min and max values with the clipped value. Rest all the code is the same.

So, always plot histogram and then decide which method to follow. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

What is Contrast in Image Processing?

According to Wikipedia, Contrast is the difference in luminance or color that makes an object distinguishable from other objects within the same field of view.

Take a look at the images shown below

Source: OpenCV

Clearly, the left image has a low contrast because it is difficult to identify the details present in the image as compared to the right image.

A real life example can be of a sunny and a foggy day. On a sunny day, everything looks clear to us, thus has a high contrast, as compared to a foggy day, where everything looks nearly of the same intensity (dull, washed-out grey look).

A more valid way to check whether an image has a low or high contrast is to plot the image histogram. Let’s plot the histogram for the above images

Clearly, from the left image histogram, we can see that the image intensity values are located in a narrow range. Because it’s hard to distinguish nearly the same intensity values (See below figure, 150 and 148 are hard to distinguish as compared to 50 and 200), thus the left image has low contrast.

The right histogram increases this gap between the intensity values and Whoo! the details in the image are now much more perceivable to us and thus yields a high contrast image.

So, for the high contrast, the image histogram should span the entire dynamic range as shown above by the right histogram. In the next blogs, we will learn different methods to do this.

There is another naive approach where we subtract the max and min intensity values and based on this difference we judge the image contrast. I will not recommend following this as this may get affected by the outliers (we will discuss in the next blogs). So, always plot the histogram to check.

Till now, we discussed contrast but we didn’t discuss the cause of low contrast images.

Low contrast images can result from Poor illumination, lack of dynamic range in the imaging sensor or even wrong setting of lens aperture during image acquisition etc.

When performing Contrast enhancement, you must first decide whether you want to do global or local contrast enhancement. Global means increasing the contrast of the whole image, While in local we divide the image into small regions and perform contrast enhancement on these regions independently. Don’t Worry, we will discuss these in detail in the next blogs.

This concept has been beautifully illustrated by the figure shown below( Taken from OpenCV Documentation)

Original Image

Clearly, on global enhancement, the details present on the face of the statue are lost. While these are preserved in the local enhancement. So you need to be careful when selecting these methods.

In the next blog, we will discuss the methods used to transform a low contrast image into a high contrast image. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Intensity-level Slicing

Intensity level slicing means highlighting a specific range of intensities in an image. In other words, we segment certain gray level regions from the rest of the image.

Suppose in an image, your region of interest always take value between say 80 to 150. So, intensity level slicing highlights this range and now instead of looking at the whole image, one can now focus on the highlighted region of interest.

Since, one can think of it as piecewise linear transformation function so this can be implemented in several ways. Here, we will discuss the two basic type of slicing that is more often used.

  • In the first type, we display the desired range of intensities in white and suppress all other intensities to black or vice versa. This results in a binary image. The transformation function for both the cases is shown below.
  • In the second type, we brighten or darken the desired range of intensities(a to b as shown below) and leave other intensities unchanged or vice versa. The transformation function for both the cases, first where the desired range is changed and second where it is unchanged, is shown below.

Let’s see how to do intensity level slicing using OpenCV-Python. Below code is for type 1 as discussed above

For color image, either you convert into greyscale or specify the minimum and maximum range as list of BGR values.

Applications: Mostly used for enhancing features in satellite and X-ray images.

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Power Law (Gamma) Transformations

“Gamma Correction”, most of you might have heard this strange sounding thing. In this blog, we will see what it means and why does it matter to you?

The general form of Power law (Gamma) transformation function is

s = c*rγ

Where, ‘s’ and ‘r’ are the output and input pixel values, respectively and ‘c’ and γ are the positive constants. Like log transformation, power law curves with γ <1 map a narrow range of dark input values into a wider range of output values, with the opposite being true for higher input values. Similarly, for γ >1, we get the opposite result which is shown in the figure below

This is also known as gamma correction, gamma encoding or gamma compression. Don’t get confused.

The below curves are generated for r values normalized from 0 to 1. Then multiplied by the scaling constant c corresponding to the bit size used.

All the curves are scaled. Don’t get confused (See below)

But the main question is why we need this transformation, what’s the benefit of doing so?

To understand this, we first need to know how our eyes perceive light. The human perception of brightness follows an approximate power function(as shown below) according to Stevens’ power law for brightness perception.

See from the above figure, if we change input from 0 to 10, the output changes from 0 to 50 (approx.) but changing input from 240 to 255 does not really change the output value. This means that we are more sensitive to changes in dark as compared to bright. You may have realized it yourself as well!

But our camera does not work like this. Unlike human perception, camera follows a linear relationship. This means that if light falling on the camera is increased by 2 times, the output will also increase 2 folds. The camera curve looks like this

So, where and what is the actual problem?

The actual problem arises when we display the image.

You might be amazed to know that all display devices like your computer screen have Intensity to voltage response curve which is a power function with exponents(Gamma) varying from 1.8 to 2.5.

This means for any input signal(say from a camera), the output will be transformed by gamma (which is also known as Display Gamma) because of non-linear intensity to voltage relationship of the display screen. This results in images that are darker than intended.

To correct this, we apply gamma correction to the input signal(we know the intensity and voltage relationship we simply take the complement) which is known as Image Gamma. This gamma is automatically applied by the conversion algorithms like jpeg etc. thus the image looks normal to us.

This input cancels out the effects generated by the display and we see the image as it is. The whole procedure can be summed up as by the following figure

If images are not gamma-encoded, they allocate too many bits for the bright tones that humans cannot differentiate and too few bits for the dark tones. So, by gamma encoding, we remove this artifact.

Images which are not properly corrected can look either bleached out, or too dark.

Let’s verify by code that γ <1 produces images that are brighter while γ >1 results in images that are darker than intended

The output looks like this

Original Image
Gamma Encoded Images

I hope you understand Gamma encoding. In the next blog, we will discuss Contrast stretching, a Piecewise-linear transformation function in detail. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Creating Subplots in OpenCV-Python

In this blog, we will learn how to create subplots using OpenCV-Python. We know that cv2.imshow() only shows 1 image at a time. Displaying images side by side helps greatly in analyzing the result. Unlike Matlab, there is no direct function for creating subplots. But since OpenCV reads images as arrays, we can concatenate arrays using the inbuilt cv2.hconcat() and cv2.vconcat() functions. After that, we display this concatenated image using cv2.imshow().

cv2.hconcat([ img1, img2 ]) —– horizontally concatenated image as output. Same for cv2.vconcat().

Below is the sample code where i displayed 2 gamma corrected images using this method

The output looks like this

To put the text on images, use cv2.puttext() and if you want to leave spacing between the images shown, use cv2.copyMakeBorder(). You can play around with many other OpenCV functions.

Note: Array dimensions must match when using cv2.hconcat(). This means you cannot display color and greyscale images side by side using this method.

I hope this information will help you. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Multi Input and Multi Output Models in Keras

The Keras functional API is used to define complex models in deep learning . On of its good use case is to use multiple input and output in a model. In this blog we will learn how to define a keras model which takes more than one input and output.

Multi Output Model

Let say you are using MNIST dataset (handwritten digits images) for creating an autoencoder and classification problem both. In that case, you will be having single input but multiple outputs (predicted class and the generated image). Let take a look into the code.

In the above code we have used a single input layer and two output layers as ‘classification_output’ and ‘decoder_output’. Let’s see how to create model with these input and outputs.

Now we have created the model, the next thing is to compile this model. Here we will define two loss functions for both outputs. Also we can assign weights for both losses. See code.

Multi Input Model

Let’s take an example where you need to take two inputs: one grayscale image and another RGB image. Using these two images you want to do an image classification. To perform this, we will use Keras functional API. Let’s see code.

In the above code, we have extracted two different feature layers from both inputs and then concatenated both to create output layer. And created model with two inputs and one output.

A nice example where you can you use both multi input and multi output is capsule network. If you want to take a look into this, refer this blog.

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

EAT-NAS: Elastic Architecture Transfer for Neural Architecture Search

Recently, Jiemin Fang et. al. , has published a paper that introduces a method to accelerate the neural architecture search named as “elastic architecture transfer for accelerating large-scale neural architecture search“. In this blog we will learn what is neural architecture search, what are the limitations associated with it and how this paper is overcoming those limitations.

Neural Architecture Search

Neural architecture search, as its name suggests, is a method to automatically search the best network architecture for a given problem. If you have worked on neural networks, you may have encountered with the problem of selecting best hyperparameters for the the network i.e. which optimizer to select , what learning rate to use, how many layers to add and so on. To solve this problem different methods have been aroused like, evolutionary search, reinforcement learning, gradient based-optimization etc.

A neural architecture search method is not fully automated, as they rely on human designed architecture at the starting point. These methods consists of three components:

  1. Search Space: A well designed search space in which our method will search best parameters
  2. Search Method: Which method to use like reinforcement learning, evolutionary search, etc.
  3. evaluation strategy: Which parameter is used to find best architecture model.

Problem: In any neural architecture search method it requires a large amount of computational cost. Even with the recent advancement in this field, it still requires a lot of GPU days to find best architecture.

EAT-NAS

To solve the problem of computational cost, current studies first search architectures in small datasets and then directly apply to large datasets. But applying architectures searched for small datasets directly to large dataset does not guarantee performance in large datasets. To solve this problem, authors of EAT-NAS have introduced an elastic architecture transfer method to accelerate neural architecture search.

How EAT-NAS works:

In this method, architectures are first searched on small dataset, and the best one is selected as basic architecture for large dataset. This basic architecture is then transferred elastically to large dataset to accelerate search process on large dataset as shown in the figure below.

Authors have searched architecture on CIFAR-10 dataset and then elastically transferred it to large imagenet dataset. Let’s see the whole EAT-NAS process.

  1. Framework: First search for a top performing architecture on CIFAR-10. Here it is MobilenetV2.
  2. Search Space: A well-designed search space is required which is consist of five elements ( conv operation, kernel size, skip connection, width and depth factor)
  3. Population Quality: Selection of top performing model depends on its quality which is decided by the mean and variance of the accuracy of models.
  4. Architecture Scale Search: It also searches the width factor denoting the expansion ratio of filter number and depth factor denoting the number of layers per block( in selected MobilenetV2).
  5. Offspring Architecture Generator: After transferring basic architecture to large dataset, a generator take this architecture as initial seed. Then a transformation function is applied to this model to generate best architecture for large dataset.

In the above figure, the upper one is the basic architecture searched on CIFAR-10 and transferred elastically to search architecture for Imagenet dataset( lower one). It takes 22 hours on 4 GPUs to search the basic architecture on CIFAR-10 and 4 days on 8 GPUs to transfer to ImageNet. Which is quite less compared to other methods used for neural architecture search. Also, they have achieved 73.8 % accuracy on the imagenet dataset which surpasses accuracy achieved by architectures searched from scratch on imagenet dataset.

Referenced Research Paper: EAT-NAS

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Bit-plane Slicing

You probably know that everything on a computer is stored as strings of bits. In Bit-plane slicing, we take the advantage of this fact to perform various image operations. Let’s see how.

I hope you have basic understanding of binary and decimal relationship.

For an 8-bit image, a pixel value of 0 is represented as 00000000 in binary form and 255 is encoded as 11111111. Here, the leftmost bit is known as the most significant bit (MSB) as it contributes the maximum. e.g. if MSB of 11111111 is changed to 0 (i.e. 01111111), then the value changes from 255 to 127. Similarly, rightmost bit is known as Least significant bit (LSB).

In Bit-plane slicing, we divide the image into bit planes. This is done by first converting the pixel values in the binary form and then dividing it into bit planes. Let’s see by an example.

For simplicity let’s take a 3×3, 3-bit image as shown below. We know that the pixel values for 3-bit can take values between 0 to 7.

Bit Plane Slicing

I hope you understand what is bit plane slicing and how it is preformed. Next Question that comes to mind is What’s the benefit of doing this?

Pros:

  • Image Compression (We will see later how we can construct nearly the original image using less number of bits).
  • Converting a gray level image to a binary image. In general, images reconstructed from bit planes is similar to applying some intensity transformation function to the original image. e.g. Image reconstructed from MSB is same as applying thresholding function to the original image. We will validate this in the code below.
  • Through this, we can analyze the relative importance of each bit in the image that will help in determining the number of bits used to quantize the image.

Let’s see how we can do this using OpenCV-Python

Code

The output looks like this

Original Image
8 bit planes (Top row – 8,7,6,5 ; bottom – 4,3,2,1 bit planes)

Clearly from the above figure, the last 4 bit planes do not seem to have much information in them.

Now, if we combine the 8,7,6,5 bit planes, we will get approximately the original image as shown below.

Image using 4 bit planes (8,7,6,5)

This can be done by the following code

Clearly, storing these 4 frames instead of the original image requires less space. Thus, it is used in Image Compression.

I hope you understand Bit plane slicing. If you find any other application of this, please let me know. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Single Image Super-Resolution Using a Generative Adversarial Network

In recent years, the neural network has produced various breakthroughs in different areas. One of its promising results can be seen in super-resolving an image at large up-scaling factors as shown below

Isn’t it difficult to produce a high resolution image from a low resolution image?

In the paper, Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network, authors have used Generative Adversarial Network for super-resolution and are capable of producing photo-realistic natural images for 4x up-scaling factors.

In the paper, authors have used generative adversarial network (GAN) to produce single image super resolution from a low resolution image. In this blog we will see the followings:

  1. Architecture of GAN used in the paper.
  2. Loss function used for this problem.

Adversarial Network Architecture used in paper:

In the paper they have also used one discriminator and one generator model. Here, the generator is being fed with LR images and tries to generate images which are difficult to classify from real HR images by the discriminator.

Source

Generator Network: Input LR image is passed with 9*9 kernels with 64 filters and ParametricReLU. Then B residual blocks are being applied and each block is having 3*3 kernel with 64 filters followed by batch normalization and ParametricReLU. Then two sub-pixel convolution layers are applied to up-sample image to 4x.

Discriminator Network:  There is also a discriminator which will discriminate real HR image from generated SR image. It contains eight convolutional layers with an increasing number of 3 × 3 filter kernels, increasing by a factor of 2 from 64 to 512 kernels. To reduce the image resolution, strided convolutions are applied each time the number of features are doubled. The resulting 512 feature maps are followed by two dense layers and a final sigmoid activation function to obtain a probability for real or fake image.

Loss Function: In the paper, authors have defined a perceptual loss function which consists of content loss and an adversarial loss function. 

Adversarial loss tries to train generator such that it produces natural looking images which will be difficult for discriminator to distinguish from real image. In addition, they used a content loss motivated by perceptual similarity.

For content loss, mean squared error is the most widely used loss function. But it often results in perceptual unsatisfying content due to over smoothing of content. To resolve this problem authors of the paper use a loss function that is closer to perceptual similarity. They defined the VGG loss based on the ReLU activation layers of the pre-trained 19 layer VGG network.

They performed experiments on Set5, Set14 and BSD100 and tested it on BSD300 and achieved promising results. To test the results obtained by SRGAN authors have also taken mean opinion score of 26 rates. And they found results look much similar to original images.

Referenced Research Paper: Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network

GitHub: Super Resolution Examples

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Dimensionality Reduction for Data Visualization using Autoencoders

In the previous blog, I have explained concept behind autoencoders and its applications. In this blog we will learn one of the interesting practical application of autoencoders.

Autoencoders are the neural network that are trained to reconstruct their original input. But only reconstructing original input will be useless. The main purpose is to learn interesting features using autoencoders. In this blog we will see how autoencoders can be used to learn interesting features to visualize high dimensional data.

Let say if you are having a 10 dimensional vector, then it will be difficult to visualize it. Then you need to convert it into 2-D or 3-D representation for visualization purpose. There are some famous algorithms like principal component analysis that are used for dimensionality reduction. But if you implement an autoencoder that only uses linear activation function with mean squared error as its loss function, then it will end up performing principal component analysis.

Here we will visualize a 3 dimensional data into 2 dimensional using a simple autoencoder implemented in keras.

3-dimensional data

Autoencoder model architecture for generating 2-d representation will be as follows:

  1. Input layer with 3 nodes.
  2. 1 hidden dense layer with 2 nodes and linear activation.
  3. 1 output dense layer with 3 nodes and linear activation.
  4. Loss function is mse and optimizer is adam.

The following code will generate a compressed representation of input data.

Here is the generated 2-D representation of input 3-D data.

Compressed Representation

In the similar way you can visualize high dimensional data into 2-Dimensional or 3-Dimensional vectors.

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.