Tag Archives: python

Is the deconvolution layer the same as a convolutional layer?

Isn’t this an interesting topic? If you have worked with image classification problems( e.g. classifying cats and dogs) or image generation problems( e.g. GANs, autoencoders), surely you have encountered with convolution and deconvolution layer. But what if someone says a deconvolution layer is same as a convolution layer.

This paper has proposed an efficient subpixel convolution layer which works same as a deconvolution layer. To understand this, lets first understand convolution layer , transposed convolution layer and sub pixel convolution layer.

Convolution Layer

In every convolution neural network, convolution layer is the most important part. A convolution layer is consist of numbers of independent filters which convolve independently with input and produce output for the next layer. Let’s see how a filter convolve with the input.

Transposed and sub pixel Convolution Layer

Transposed convolution is the inverse operation of convolution. In convolution layer, you try to extract useful features from input while in transposed convolution, you try to add some useful features to upscale an image. Transposed convolution has learnable features which are learnt using backpropogation. Lets see how to do a transposed convolution visually.

Similarly, a subpixel convolution is also used for upsampling an image. It uses fractional strides( input is padded with in-between zero pixels) to an input and outputs an upsampled image. Let’s see visually.

An efficient sub pixel convolution Layer

In this paper authors have proposed that upsampling using deconvolution layer isn’t really necessary. So they came up with this Idea. Instead of putting in between zero pixels in the input image, they do more convolution in lower resolution image and then apply periodic shuffling to produce an upscaled image.

Source
r denotes the up scaling ratio

Authors have illustrated that deconvolution layer with kernel size of (o, i, k*r , k*r ) is same as convolution layer with kernel size of (o*r *r, i, k, k) e.g. (output channels, input channels, kernel width, kernel height) in LR space. Let’s take an example of proposed efficient subpixel convolution layer.

Source

In the above figure, input image shape is (1, 4, 4) and upscaling ratio(r) is 2. To achieve an image of size (1, 8, 8), first input image is applied with kernel size of (4, 1, 2, 2) which produces output of shape (4, 4, 4) and then periodic shufling is applied to get required upscaled image of shape (1, 8, 8). So instead of using deconvolution layer with kernel size of (1, 1, 4, 4) same can be done with this efficient sub pixel convolution layer.

Implementation

I have also implemented an autoencoder(using MNIST dataset) with efficient subpixel convolution layer. Let’s see the code for efficient subpixel convolution.

The above periodic shuffling code is given by this github link. Then applied autoencoder layers to generate image. To up-sample image in decoder layers first convolved encoded images then used periodical shuffling.

This type of subpixel convolution layers can be very helpful in problems like image generation( autoencoders, GANs), image enhancement(super resolution). Also there is more to find out what can this efficient subpixel convolution layer offers.

Now, you might have got some feeling about efficient subpixel convolution layer. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Referenced Research Paper : Is the deconvolution layer the same as a convolutional layer?

Referenced Gitub Link : Subpixel

Python Closures

Python Closures are related to nested functions(a function defined inside another function). As we have discussed in the previous blog, nested functions can access the variables from the enclosing scope that can be modified using the nonlocal keyword.

You cannot call these variables outside their scopes but closure remembers these values even if they are not present in the memory.

Now, let’s see how to define a closure

Here, the enclosing function returns the nested function instead of calling it. This returned function was bound to name_closure. On calling name_closure(), the name was still remembered although we had already finished executing the enclosing function.

This can be used for data hiding and a substitute for classes if we have only one method in the class (Always prefer using class over this according to me).

You can check for closure function with the help of __closure__ attribute. This returns a tuple of cell objects if it is a closure function otherwise prints nothing as shown below

In the next blog, we will learn decorators that use closures as well. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Densely Connected Convolutional Networks – DenseNet

When we see a machine learning problem related to an image, the first things comes into our mind is CNN(convolutional neural networks). Different convolutional networks like LeNet, AlexNet, VGG16, VGG19, ResNet, etc. are used to solve different problems either it is supervised(classification) or unsupervised(image generation). Through these years there has been more deeper and deeper CNN architectures are used. As more complex problem comes, more deeper convolutional networks are preferred. But with deeper networks problem of vanishing gradient arises.

To solve this problem Gao Huang et al. introduced Dense Convolutional networks. DenseNets have several compelling advantages:

  1. alleviate the vanishing-gradient problem
  2. strengthen feature propagation
  3. encourage feature reuse, and substantially reduce the number of parameters.

How DenseNet works?

Recent researches like ResNet also tries to solve the problem of vanishing gradient. ResNet passes information from one layer to another layer via identity connection. In ResNet features are combined through summation before passing into the next layer.

While in DenseNet, it introduces connection from one layer to all its subsequent layer in a feed forward fashion (As shown in the figure below). This connection is done using concatenation not through summation.

source: DenseNet

ResNet architecture preserve information explicitly through identity connection, also recent variation of ResNet shows that many layers contribute very little and can in fact be randomly dropped during training. DenseNet architecture explicitly differentiates between information that is added to the network and information that is preserved.

In DenseNet, Each layer has direct access to the gradients from the loss function and the original input signal, leading to an r improved flow of information and gradients throughout the network, DenseNets have a regularizing effect, which reduces overfitting on tasks with smaller training set sizes.

An important difference between DenseNet and existing network architectures is that DenseNet can have very narrow layers, e.g., k = 12.  It refers to the hyperparameter k as the growth rate of the network. It means each layer in dense block will only produce k features. And these k features will be concatenated with previous layers features and will be given as input to the next layer.

DenseNet Architecture

The best way to illustrate any architecture is done with the help of code. So, I have implemented DenseNet architecture in Keras using MNIST data set.

A DenseNet consists of dense blocks. Each dense block consists of convolution layers. After a dense block a transition layer is added to proceed to next dense block (As shown in figure below).

Every layer in a dense block is directly connected to all its subsequent layers. Consequently, each layer receives the feature-maps of all preceding layer.

Each convolution layer is consist of three consecutive operations: batch normalization (BN) , followed by a rectified linear unit (ReLU) and a 3 × 3 convolution (Conv). Also dropout can be added which depends on your architecture requirement.

An essential part of convolutional networks is down-sampling layers that change the size of feature-maps. To facilitate down-sampling in DenseNet architecture it divides the network into multiple densely connected dense blocks(As shown in figure earlier).

The layers between blocks are transition layers, which do convolution and pooling. The transition layers consist of a batch normalization layer and an 1×1 convolutional layer followed by a 2×2 average pooling layer.

DenseNets can scale naturally to hundreds of layers, while exhibiting no optimization difficulties. Because of their compact internal representations and reduced feature redundancy, DenseNets may be good feature extractors for various computer vision tasks that build on convolutional features

The full code can be found here.

Referenced research paper: Densely Connected Convolutional Networks

Hope you enjoy reading. If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Object Tracking Using Color Models OpenCV-Python

In this tutorial, we will learn how we can use color models for object tracking. You can use any color model. Here, I have used HSI because it is easier to represent a color using the HSI model (as it separates the color component from greyscale). Let’s see how to do this

Steps

  1. Open the camera using cv2.VideoCapture()
  2. Create 3 Trackbars of H, S, and I using cv2.createTrackbar()
  3. Read frame by frame
  4. Record the current trackbar position using cv2.getTrackbarPos()
  5. Convert from BGR to HSV using cv2.cvtColor()
  6. Threshold the HSV image based on current trackbar position using cv2.inRange()
  7. Extract the desired result

Code

Open in full screen and play with trackbar to get more intuition about HSI model. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Color Models

In the previous blogs, we represented the color image using the RGB components but this is not the only way available. There are different color models ( A color model is simply a way to define the color) available, each having their own pros and cons.

There are two types of color models available: Additive and Subtractive. Additive uses light (transmitted) to display color while subtractive models use printing inks. These models are fitted into different shapes to obtain new models (See HSI model below).

In this blog, we’ll discuss the three that are most commonly used in the context of digital image processing: RGB, CMY, and HSI

The RGB Color Model

In this, we construct a color cube whose 3 axes denote R, G, and B  respectively as shown below

Normalized RGB Color Cube; Source: Researchgate

This is an additive model, i.e. the colors present in the light add to form new colors. For example, Yellow has coordinate of (1,1,0) which means Yellow =  Red + Green. Similarly, for other colors like cyan = Blue + Green and magenta = Red +Blue.

R, G, and B are added together in varying proportions to produce an extensive range of colors. Mixing equal proportions of R, G, and B falls on the grayscale line.

Use: color monitors and most video cameras.

The CMYK Color Model

CMY stands for cyan, magenta, and yellow also known as secondary colors of light. K refers to black. An equal proportion of C, M, and Y produce muddly black and not pure black. That’s why we use CMYK instead of CMY model.

This is a subtractive model i.e colors are perceived as a result of reflected light. e.g. when light falls on a cyan coated surface, red is absorbed (or subtracted) while Green and Blue are reflected and thus G + B = Cyan. Similarly for magenta and yellow.

Thus, CMY can be obtained from RGB by subtracting RGB from the max intensity.

Use: Printing like books, magazines etc.

The HSI Color Model

HSI stands for Hue, Saturation, and Intensity. This model is similar to how humans perceive color. Let’s understand HSI terms

Hue: Color attribute that describes the pure color or dominant wavelength.

Saturation: Purity of Color or how much a pure color is diluted by white light.

Intensity: Amount of light

H and S tell us about the chromaticity (color information) of the light while I carries the greyscale information.

HSI model can be obtained by rotating the RGB cube such that Black is at the bottom and white at the top.

H varies from 0 to 120 degrees for Red, 120 – 240 for Green, and 240 -360 for Blue. Saturation can take value from 0 to 100%. Intensity value varies according to the bit size of an image.

Pros: Easier to represent the color than the RGB model.

Note: We can also use these color models for object tracking (See here).

In the next blog, we will see how different colors can be generated from these color models with the help of OpenCV. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Changing Video Resolution using OpenCV-Python

In this tutorial, I will show how to change the resolution of the video using OpenCV-Python. This blog is based on interpolation methods (Chapter-5) which we have discussed earlier.

Here, I will convert a 640×480 video to 1280×720. Let’s see how to do this

Steps:

  1. Load a video using cv2.VideoCapture()
  2. Create a VideoWriter object using cv2.VideoWriter()
  3. Extract frame by frame
  4. Resize the frames using cv2.resize()
  5. Save the frames to a video file using cv2.VideoWriter()
  6. Release the VideoWriter and destroy all windows

Code:

Here, I have used Bicubic as the interpolation method, you can use any. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Variational Autoencoders

Variational autoencoders are an extension of autoencoders and used as generative models. You can generate data like text, images and even music with the help of variational autoencoders.

Autoencoders are the neural network used to reconstruct original input. To know more about autoencoders please got through this blog. They have a certain application like denoising autoencoders and dimensionality reduction for data visualization. But apart from that, they are fairly limited.

To overcome this limitation, variational autoencoders comes into place. A common autoencoder learns a function which does not train autoencoder to generate images from a particular distribution. Also, if you try to create a generative model using autoencoders, you do not want to generate data as therein input. You want the output data with some variations which mostly look like input data.

Variational Autoencoder Model

A variational autoencoder has encoder and decoder part mostly same as autoencoders, the difference is instead of creating a compact distribution from its encoder, it learns a latent variable model. These latent variables are used to create a probability distribution from which input for the decoder is generated. Another is, instead of using mean squared or cross entropy loss function (as in autoencoders ) it has its own loss function.

I will not go further into the mathematics behind it, Lets jump into the code which will give more understanding about variational autoencoders. To know more about the mathematics behind it please go through this tutorial.

I have implemented variational autoencoder in keras using MNIST dataset. So lets first download the data.

Now create an encoder model as it is created in autoencoders.

Latent Distribution Parameters and Function

Now encode the output of the encoder to latent distribution parameters. Here, I have created two parameters mu and sigma which represents the mean and standard distribution of the distribution.

Here I have taken latent space dimension equal to 2. This is the bottleneck which means we are passing our entire set of data to two single variables. So if we increase our latent space dimension to 5, 10 or higher, we can get better results in the output. But this will create more data in the bottleneck.

Now create a Gaussian distribution function with mean zero and standard deviation of 1. This distribution will give variation in the input to the decoder, which will help to get variation in the output. Then decoder will predict the output using distribution.

Loss Function

For the loss function, a variational autoencoder uses the sum of two losses, one is the generative loss which is a binary cross entropy loss and measures how accurately the image is predicted, another is the latent loss, which is KL divergence loss, measures how closely a latent variable match Gaussian distribution. This KL divergence makes sure that our distribution generated from encoder do not go away from the origin. Then train the model.

Our model is ready and we can generate images from it very easily. All we need to do is sample latent variable from distribution and pass it to the decoder. Lets test with the following code:

Here is the output generated from sampled distribution in the above code.

The full code can be find here.

Hope you understand the basics of variational autoencoders. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Referenced papers: Auto-Encoding Variational BayesTutorial on Variational Autoencoders

Denoising Autoencoders

In my previous blog, we have discussed what is an autoencoder, its applications and a simple implementation in keras. In this blog, we will see a variant of autoencoder – ‘ denoising autoencoders ‘.

A denoising autoencoder is an extension of autoencoders. An autoencoder tries to learn identity function( output equals to input ), which makes it risking to not learn useful feature. One method to overcome this problem is to use denoising autoencoders.

For training a denoising autoencoder, we need to use noisy input data. For that, we need to add some noise to an original image. The amount of corrupting data depends on the amount of information present in data. Usually, 25-30 % data is being corrupted. This can be higher if your data contains less information. Let see how you can add noise to data in code:

To calculate loss, the output of the denoising autoencoder is then compared to original input instead of the corrupted one. Such a loss function train model to learn interesting features rather than learning identity function.

I have implemented denoising autoencoder in keras using MNIST data, which will give you an overview, how a denoising autoencoder works.

following is the result of denoising autoencoder.

The full code can be find here.

Hope you understand the usefulness of denoising autoencoder. In the next blog, we will feature variational autoencoders. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Autoencoders

Let’s start with a simple definition of autoencoders. ‘ Autoencoders are the neural networks trained to reconstruct their original input’.

Now, you might be thinking what’s the use of reconstructing same data. Let me give you an example If you want to transfer data of GB’s of size and somehow if you can compress it into MB’s and then able to reconstruct back the data to the original size, isn’t that a better way to transfer data. This is one of the applications of autoencoders.

Autoencoders generally consists of two parts, one is encoder and other is decoder. Encoder downscale data to less number of features and decoder upscale the extracted features to original one.

There are some practical applications of autoencoders:

  1. Dimensionality reduction for data visualization
  2. Image Denoising
  3. Generative Models

Visualizing a 10-dimensional vector is difficult. To overcome this problem we need to reduce that 10-dimensional vector into 2-D or 3-D. One of the famous algorithm PCA (Principal Component Analysis) tries to solve this problem. PCA uses linear transformations while autoencoders can use both linear and non-linear transformations for dimensionality reduction. Which makes autoencoders to generate more complex and interesting features than PCA.

Autoencoders can be used to remove the noise present in the image. It can also be used to generate new images required for a specific task. We will see more about these two applications in the next blog.

Now, let’s start with the simple implementation of autoencoders in Keras using MNIST data. First, let’s download MNIST training and test data and reshape it.

Encoder

MNIST data consists of images of digits. So, it is better to use a convolutional neural network in our encoders and decoders. In our encoder, I have used conv and max-pooling layers to extract the compressed representation. Then flatten the encoder output to 32 features. Which will be the input to the decoder.

Decoder

In the decoder, we need to upsample the extracted 32 features into the original size of the image. To achieve this, I have used Conv2DTranspose functions from keras. Then the final layer of the decoder will give the reconstructed output which will be similar to the original input.

To minimize reconstruction loss, we train the network with a large dataset and update weights. Now, our model is created, the next thing is to compile and train the model.

Below are the results from autoencoder trained above. The first line of digits shows the original input (test images) while the second line represents the reconstructed inputs from the model.

The full code can be find here.

Hope you understand the basics of autoencoders, where these can be used and how a simple autoencoder be implemented. In the next blog, we will see how to denoise an image using autoencoders. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Referenced Research Paper: http://proceedings.mlr.press/v27/baldi12a/baldi12a.pdf

Snake Game with Genetic Algorithm

Before moving forward, let’s first summarize what we have done till now. In the first blog, we created a snake game using pygame. Then in the next blog, using backpropagation, we let the neural network learn how to play snake game.

In this blog, we will let the genetic algorithm (GA) and neural network(NN) play the snake game (if you are new to genetic algorithm please refer to this blog).

But let’s first clear some doubts which are obvious to come in mind.

What is the advantage of using GA + NN?

We don’t need any training data.

Note: I am not saying this is better than backpropagation but for snake game problem creating a valid training data is a tough task so this is a good option other than reinforcement learning, which we will see later.

Where we will use GA in NN?

Instead of backpropagation, we will update weights using GA.

How we will use GA in NN?

This whole process can be easily summarized in 7 steps:

  1. Creating a snake game and deciding neural network architecture.
  2. Creating an initial population.
  3. Deciding the fitness function.
  4. Play a game for each individual in the population and sort each individual in the population based on the fitness function score.
  5. Select a few top individuals from the population and create the remaining population from these top selected individuals using Crossover and mutation.
  6. The new population is created (meaning the next generation).
  7. Go to step 4 and repeat until the stopping criteria are not satisfied.

Now, I hope most of the things might be clear to you. Let’s see these steps in more detail

Creating a snake game and deciding neural network architecture

Snake game is created with pygame and network architecture is same as that of the previous blog with 7 units in the input layer, 3 units in the output layer with ‘softmax’ and used 2 hidden layers one of 9 units and other of 15 units with ‘relu’ as shown below.

Creating Initial Population

Here, I have chosen 50 individuals in the population and each individual is an array of weights of the neural network. Randomly initialize these individuals. See code below

Deciding the Fitness Function

You can use any fitness function. Here, I have used the following fitness function

On every grasp of food, I have given 5000 reward points and if it collides with the boundary or itself, I have awarded a penalty of 150 points

You can find this inside the run_game_with_ML() code (Last line).

Evaluating the population (Step – 4)

For each individual, a game is played and the fitness function is calculated which is then appended in the list as shown in the code below

Selection, Crossover, and Mutation (Step – 5)

Selection: Now, according to fitness value, some best individuals will be selected from the population and are stored in the ‘parents’ array as shown in the code below

Crossover: To produce children for the next generation, the crossover is used. First, two individuals are randomly selected from the best, then I randomly choose some values from first and some from the second individual to produce new offspring. This process is repeated until the total population size is not achieved as shown below

Mutation: Then, some variations are being added to the newly formed offspring. Here, for each child, I randomly selected 25 weights and mutated them by adding some random value as shown in the code below

New Population Created

With the help of fitness function, crossover and mutation, new population for the next generation is created. Now, next thing is to replace the previous population with this newly formed. Below is the code for this

Now we will repeat this process until our target for certain game score is not achieved. In this problem, I have used 100 generations for training and 2500 steps in a game, which was able to achieve a maximum score of 40. You can choose more number of steps per game and can achieve more score.

The full code can be found here.

In the next blog, we will see how the genetic algorithm can be applied for neural network architecture search. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.