An Introduction To The Progressive Growing of GANs

Generative adversarial networks are famous for generating images. But generating images with high resolution was quite difficult until the introduction of a new training methodology known as the progressive growing of GANs. Progressive growing GANs architecture was proposed by NVIDIA in the paper published in 2017 titled as ” PROGRESSIVE GROWING OF GANS FOR IMPROVED QUALITY, STABILITY, AND VARIATION“. This architecture starts with low-resolution images such as 4×4 size and then add up the layers progressively to generate images of high resolution such as 1024×1024.

Traditional GANs were facing some real issues to generate images of high quality. Here I am listing some of the major problems:

  1. Discriminator will be easily able to differentiate b/w real and fake if generated images are large.
  2. Generating such high-quality images also requires large GPU memory due to higher computational cost.
  3. Due to the high memory requirement if we take less batch size it will also make an unstable GAN model.
  4. It was also difficult to produce both large and fine detailed images.

In contrast to these difficulties, progressive growing of GANs removed some obstacles coming in between creating high-quality images. Some of them are:

  1. It reduces training time.
  2. The model becomes more stable since we can train it with a mini-batch of efficient size.

Generally, a generative adversarial network consists of two network generator and discriminator. The generator takes a latent vector as input and produces a generated image. And discriminator discriminates between these generated images with original as real vs fake. Training of this model will proceed until images generated from the generator are not able to fool the discriminator half the time. Similarly, progressive GAN architecture consists of both generator and discriminator networks where both networks are a mirror image of each other.

The Network Architecture

Both generator and discriminator starts with very small image 4×4. Original images are transformed into 4×4 size to train the model. Since these images are quite small training will be fast. Once we have fed enough 4×4 images to the discriminator network, we will progressively add up new layers for an 8×8 size image and similarly 16×16 until we reach image resolution to 1024×1024. Nearest neighbor interpolation and average pooling were used for doubling and halving the size of the image. The transition from a 4×4 network to an 8×8 network was done smoothly by fading in new layers.

1. Fading in new Layer

We will see this fading by using an example of the transition from 16×16 to 32×32 resolution images.

In this example, the current resolution of the image is 16×16. Firstly the model will be trained on 16×16 images. The original images are transformed into 16×16 size for training. After training it on a sufficient number of images we will progressively add new layers. In generator nearest neighbor filtering is used to upsample the image while in discriminator average pooling is applied to downsample the image size.

Now to increase the network progressively a residual block for 32×32 resolution is added. During training, this new block is not directly added but faded in. This block consists of two convolution layers and 1 upsampling layer in the generator network. While two convolution layer and one average pooling layer in discriminator network. The new block is multiplied with α and the previous(16×16) is multiplied with (1-α). This α value linearly increases from 0 to 1. Even after fading in of new layers all previous layers in the model will remain trainable.

Similarly, if you want to produce an image of higher resolution more layers will be added progressively. 1×1 convolution layer is added to the last layer in the generator to convert into an RGB image. Similarly, a 1×1 convolution layer is added at top of the discriminator network to get from the RGB image (or generated image).

During the training of progressive GAN, the network starts from 4×4 size and adds up layer progressively to reach the size of 1024×1024. Leaky relu is used for training the model. To train the model it took 4 days on 8 Tesla V100 GPUs.

Source

2. Minibatch Standard Deviation

Generative adversarial networks has a tendency to capture only little variation from training data. Sometimes all input noise vectors generate similar looking images. This problem is also known as ‘mode collapse’. To add a little variation to generated images, authors of the progressive gans have used minibatch standard deviation.

Here standard deviation of each feature in the activation map is calculated and then averaged over the minibatch. Through this new activation, maps are created. These new activation maps are added at the end of the discriminator network.

3. Equalized Learning Rate

In this progressive GAN architecture, authors have not initialized weights carefully. But they are scaling weights dynamically at run time. Here wˆi = wi/c, where wi are the weights and c is the per-layer normalization constant. In general, due to modern initializer, some parameters have larger dynamic range which causes them to converge later than some other parameters. This can cause both low and high learning rates at the same time. But the equalized learning rate ensures the learning rate the same for all weight parameters.

4. Pixel-wise Feature Vector Normalization in Generator

Generally in the generative adversarial network, batch normalization is used after the convolutional layer. But here in progressive GAN, feature vector in each pixel is normalized to unit length after the convolution layer. Also, this normalization is done only in the generator network, not in discriminator network. This technique prevents the escalation of signal magnitude effectively.

This new architecture with some interesting idea of minibatch standard deviation, equalized learning rate, fading in a new layer, and pixel-wise normalization has shown very promising results. With the help of progressively growing of GAN, the model is able to generate a high-quality image. Also, the training is quite stable. This GAN is able to generate high-resolution images with photo-realistic synthetic images.

Referenced Research Paper: Progressive Growing of GANs for Improved Quality, Stability, and Variation

Hope you enjoy reading.

If you have any doubts/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Leave a Reply