Tag Archives: GAN MNIST

Information Maximizing Generative Adversarial Network (InfoGAN): Introduction and Implementation

InfoGAN is an extension to the generative adversarial networks. Generative adversarial networks are trained to generate new images that look similar to the original images. But they do not provide any control over the generation of the new images. Let’s say you have trained a GAN network to generate new faces that look similar to the given dataset. But there you will not have any control over these faces such as the colour of the eyes, hairstyles, etc. But with the help of InfoGAN, we can achieve these results because InfoGAN is able to learn the disentangled representation.

Introduction

A generative adversarial network consist of two networks – a generator and a discriminator. Both of these networks are trained in an adversarial manner. While the generator tries to generate images similar to original images, discriminator tries to differentiate between images generated by the generator and original images. Training continues until discriminator is fooled half the time by generator and generator is able to generate images similar to original images.

Control Variables

In a general GAN, a random input noise vector is given as input to the generator network which does not provide any information to the generator network i.e. in which manner outputs should be generated. While InfoGAN uses latent code along with noise vector to generate images accordingly. Input to the generator of the InfoGAN can be given in two parts:

  1. Continuous noise vector, z.
  2. Latent codes which can be both discrete and continuous, c.

Let say we have trained our InfoGAN on MNIST handwritten digit datasets. Here discrete latent codes (0-9) can be used to generate specific digits between 0-9. While continuous latent codes can be used to generate digits with varying thickness and orientation.

Mutual Information

InfoGAN stands for information maximizing GAN. To maximize information, InfoGAN uses mutual information. In information theory, the mutual information between X and Y, I(X; Y ), measures the “amount of information” learned from knowledge of random variable Y about the other random variable X. In InfoGAN there should be high mutual information between latent code c and generated images.

To maximize this mutual information, the InfoGAN model requires an extra network named as an auxiliary model. This auxiliary model shares all the weights from the discriminator network except the output layer. As the discriminator network has an output layer which predicts the given input image is real or fake, the auxiliary network predicts the latent codes.

So the InfoGAN will consist of three networks – Generator, Discriminator, and auxiliary network. Both the discriminator and auxiliary networks are used to improve the generator network. Here, the generation of real looking images by generator network is regularized by the discriminator network and maximization of mutual information is regularized by the auxiliary network.

Implementation

In this blog, we will implement InfoGAN using MNIST handwritten digit dataset. To maximize the information we will only use discrete codes to generate particular digits. In addition to this, you can also use two continuous variables to define the rotation and thickness of the generated digits.

Imports and Initialization

Generator Network

Input to the generator network consists of shape (110, 1), where 100 is the noise vector size and 10 is the latent code size. Here latent codes are one-hot encoded discrete number between 0-9. I have used deconvolutional layers to upsample and finally produce the shape of (28,28,1). Batch normalization is used to improve the quality of the trained network and for stabilization.

Discriminator and Auxiliary Network

As I have already told that auxiliary network shares all the weights of the discriminator network except the output layer there is no need to create two separate functions for this. Networks take images of shape (28, 28, 1) as input. convolutional, batch normalization and pooling layers are used to create the network. The output shape of the discriminator network is 1 as it only predicts the input image is real or fake. But the output shape of the auxiliary network is 10 as it predicts latent code.

Combined Model

A combined model is created to train the generator network. Here we do discriminator network as non-trainable as discriminator network is trained separately. The combined model takes random noise and latent code as input. This input is fed to the generator network and the generated image is fed to both discriminator and auxiliary network.

Training InfoGAN

Training a GAN model is always a difficult task. A careful hyperparameter tuning is always required. We will use the following steps to train the InfoGAN model.

  1. Normalize the input images from the MNIST dataset.
  2. Train the discriminator model using real images from the MNIST dataset.
  3. Train the discriminator model using real images and corresponding labels.
  4. Train the discriminator model using fake images generated from the generator network.
  5. Train the auxiliary network using fake images generated from the generator and random latent codes.
  6. Train the generator network using a combined model without training the discriminator.
  7. Repeat the steps from 2-6 for some iterations. I have trained it for 60000 iterations.

Generation

Now we will generate images from the trained gan model. The generator will be provided with random noise and one hot encoded input digit between 0-9 whichever digit we want to generate.

Here are the generated results from the model:

Referenced Research Paper: InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets

Hope you enjoy reading.

If you have any doubts/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Implementation of GANs to generated Handwritten Digits

In the previous blog, we studied about GANs, now in this blog, we will implement GANs to generate MNIST digits dataset.

In the generative adversarial networks, both generator and discriminator are trained simultaneously. Both networks can overpower each other if not trained properly. If discriminator is trained more than it will easily detect fake and real image then the generator will not able to generate real-looking images. And if the generator is trained heavily then discriminator will not be able to classify between real and fake images. We can solve this problem by properly setting the learning rate for both networks.

When we train discriminator we do not train generator and when we train generator we do not train discriminator. This makes the generator to train properly. Now, let’s look into the code for each part on the GAN network.

Discriminator Network:

We are using MNIST digits dataset which is having an image shape of (28, 28, 1). Since the image size is small we can use MLP network for discriminator instead of using convolutional layers. To do this first we need to reshape input into a single vector of size (784, 1). Then I have applied three dense layers of 512, 256 and 128 hidden units in each layers.

Generator Network:

To create generator network we will first take random noise as input with the shape of (100, 1). Then I have used three hidden layers with shape of 256, 512 and 1024. The output of the generator network is then reshaped to (28, 28, 1). I have batch normalization in each hidden layer. Batch normalization improves the quality of the trained model and also stabilizes the training process.

Combined Model:

To train the generator we need to create a combined model where we do not train the discriminator model. In combined model random noise is being given as input to the generator network and the output image is then passed through the discriminator network to get the label. Here I have flagged discriminator model as non-trainable.

Training the GAN network:

Training a GAN network requires careful hyper-parameters tuning. If the model is not trained carefully it will not converge to produce good results. We will use the following steps to train this GAN network:

  1. Firstly we will normalize input dataset (MNIST images).
  2. Train the discriminator with real images (from MNIST dataset)
  3. Sample same number of noise vectors to predict the output from generator network (Generator is not trained here).
  4. Train the discriminator network with images generated in the previous step.
  5. Take new random samples to train the generator with a combined model without training discriminator.
  6. Repeat from step 2-5 for some number of iterations. I have trained it for 30000 iterations.

Take a look into the generated images from this GAN network.

Here is the full code.

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.