Tag Archives: transposed convolution

Is the deconvolution layer the same as a convolutional layer?

Isn’t this an interesting topic? If you have worked with image classification problems( e.g. classifying cats and dogs) or image generation problems( e.g. GANs, autoencoders), surely you have encountered with convolution and deconvolution layer. But what if someone says a deconvolution layer is same as a convolution layer.

This paper has proposed an efficient subpixel convolution layer which works same as a deconvolution layer. To understand this, lets first understand convolution layer , transposed convolution layer and sub pixel convolution layer.

Convolution Layer

In every convolution neural network, convolution layer is the most important part. A convolution layer is consist of numbers of independent filters which convolve independently with input and produce output for the next layer. Let’s see how a filter convolve with the input.

Transposed and sub pixel Convolution Layer

Transposed convolution is the inverse operation of convolution. In convolution layer, you try to extract useful features from input while in transposed convolution, you try to add some useful features to upscale an image. Transposed convolution has learnable features which are learnt using backpropogation. Lets see how to do a transposed convolution visually.

Similarly, a subpixel convolution is also used for upsampling an image. It uses fractional strides( input is padded with in-between zero pixels) to an input and outputs an upsampled image. Let’s see visually.

An efficient sub pixel convolution Layer

In this paper authors have proposed that upsampling using deconvolution layer isn’t really necessary. So they came up with this Idea. Instead of putting in between zero pixels in the input image, they do more convolution in lower resolution image and then apply periodic shuffling to produce an upscaled image.

Source
r denotes the up scaling ratio

Authors have illustrated that deconvolution layer with kernel size of (o, i, k*r , k*r ) is same as convolution layer with kernel size of (o*r *r, i, k, k) e.g. (output channels, input channels, kernel width, kernel height) in LR space. Let’s take an example of proposed efficient subpixel convolution layer.

Source

In the above figure, input image shape is (1, 4, 4) and upscaling ratio(r) is 2. To achieve an image of size (1, 8, 8), first input image is applied with kernel size of (4, 1, 2, 2) which produces output of shape (4, 4, 4) and then periodic shufling is applied to get required upscaled image of shape (1, 8, 8). So instead of using deconvolution layer with kernel size of (1, 1, 4, 4) same can be done with this efficient sub pixel convolution layer.

Implementation

I have also implemented an autoencoder(using MNIST dataset) with efficient subpixel convolution layer. Let’s see the code for efficient subpixel convolution.

The above periodic shuffling code is given by this github link. Then applied autoencoder layers to generate image. To up-sample image in decoder layers first convolved encoded images then used periodical shuffling.

This type of subpixel convolution layers can be very helpful in problems like image generation( autoencoders, GANs), image enhancement(super resolution). Also there is more to find out what can this efficient subpixel convolution layer offers.

Now, you might have got some feeling about efficient subpixel convolution layer. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Referenced Research Paper : Is the deconvolution layer the same as a convolutional layer?

Referenced Gitub Link : Subpixel