In this tutorial, we will learn how to generate images of handwritten digits using the deep convolutional generative adversarial network.
What are GANs?
GANs are one of the most interesting ideas in deep learning today. In GANs two networks work adversarially. One is generator network which tries to generate new images which looks similar to original image dataset. Another is discriminator network which discriminates between real images (images from the dataset) and fake images (images generated from generator network).
During training, generator progressively becomes better at generating images that can not be distinguishable from real images and discriminator become more accurate at discriminating them. Training gets completed when discriminator can no longer discriminate between images generated by generator and real images.
I would recommend you to go through this blog to learn more about generative adversarial networks. Now we will implement Deep convolutional adversarial Networks using MNIST handwritten digits dataset.
Import All Libraries
1 2 3 4 5 6 7 |
from keras.layers import Input, Dense, Reshape, BatchNormalization, LeakyReLU, Conv2DTranspose, Conv2D, AveragePooling2D, Flatten from keras.models import Model from keras.optimizers import Adam from keras.datasets import mnist import numpy as np import time |
Initialization
1 2 3 4 5 6 7 8 9 10 11 |
def __init__(self): (self.x_train, self.y_train), (self.x_test, self.y_test) = mnist.load_data() self.batch_size = 128 self.half_batch_size = 64 self.latent_dim = 100 self.iterations = 30000 self.optimizer = Adam(0.0002, 0.5) self.generator_model = self.generator() self.discriminator_model = self.discriminator() self.combined_model = self.combined() |
Generator Network
Generator network takes random noise as input and generates meaningful images which looks similar to real images. Inputs have a shape of vector size 100. Output images have shape of (28, 28, 1) which is same as images shape in MNIST dataset.
In generator network we use deconvolutional layers to upsample the input to image size. In convolutional layers network tries to extract some useful features while in deconvolutional layers, the network tries to add some interesting features to upsample an image. To know more about deconvolution you can read this blog. I have also added batch normalization layers to improve the quality of model and stabilizing the training process. For this network, I have used cross-entropy loss and Adam optimizer. Here is the code.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
def generator(self): input_gen = Input(shape = (self.latent_dim,)) dense1 = Reshape((7,7,128))(Dense(7*7*128)(input_gen)) batch_norm_1 = BatchNormalization()(dense1) trans_1 = Conv2DTranspose(128, 3, padding='same', activation=LeakyReLU(alpha=0.2), strides=(2, 2))(batch_norm_1) batch_norm_2 = BatchNormalization()(trans_1) trans_2 = Conv2DTranspose(64, 3, padding='same', activation=LeakyReLU(alpha=0.2), strides=(2, 2))(batch_norm_2) output = Conv2D(1, (28,28), activation='tanh', padding='same')(trans_2) gen_model = Model(input_gen, output) gen_model.compile(loss='binary_crossentropy', optimizer=self.optimizer) print(gen_model.summary()) return gen_model |
Discriminator Network
Discriminator network discriminates between real and fake images. So it is a binary classification network. This network consists of
- the input layer of shape (28, 28, 1),
- Three hidden layers of 16, 32 and 64 filters and
- the output layer of shape 1.
I have also used batch normalization layer after every conv layer to stabilize the network. To downsample, I have used average pooling instead of max pooling. Finally compiled the model with cross entropy loss and Adam optimizer. Here is the code.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
def discriminator(self): input_disc = Input(shape = (28, 28, 1)) conv_1 = Conv2D(16, 3, padding = 'same', activation = LeakyReLU(alpha=0.2))(input_disc) batch_norm1 = BatchNormalization()(conv_1) pool_1 = AveragePooling2D(strides = (2,2))(batch_norm1) conv_2 = Conv2D(32, 3, padding = 'same', activation = LeakyReLU(alpha=0.2))(pool_1) batch_norm2 = BatchNormalization()(conv_2) pool_2 = AveragePooling2D(strides = (2,2))(batch_norm2) conv_3 = Conv2D(64, 3, padding = 'same', activation = LeakyReLU(alpha=0.2))(pool_2) batch_norm3 = BatchNormalization()(conv_3) pool_3 = AveragePooling2D(strides = (2,2))(conv_3) flatten_1 = Flatten()(pool_3) output = Dense(1, activation = 'sigmoid')(flatten_1) disc_model = Model(input_disc, output) disc_model.compile(loss='binary_crossentropy', optimizer=self.optimizer, metrics=['accuracy']) print(disc_model.summary()) return disc_model |
Combined Model
After creating generator and discriminator network, we need to create a combined model of both to train the generator network. This combined model takes the random noise as input, generates images from generator and predict label from discriminator. The gradients generated from this are used to train the generator network. In this model, we do not train the discriminator network. Here is the code.
1 2 3 4 5 6 7 8 9 10 11 |
def combined(self): inputs = Input(shape = (self.latent_dim,)) gen_img = self.generator_model(inputs) self.discriminator_model.trainable = False outs = self.discriminator_model(gen_img) comb_model = Model(inputs, outs) comb_model.compile(loss='binary_crossentropy', optimizer=self.optimizer, metrics=['accuracy']) print(comb_model.summary()) return comb_model |
Training of GAN model:
To train a GAN network we first normalize the inputs between -1 and 1. Then we train this model for a large number of iterations using the following steps.
- Take random input data from MNIST normalized dataset of shape equal to half the batch size and train the discriminator network with label 1 (real images).
- Generate samples from generator network equal to half the batch size to train the discriminator network with label 0 (fake images).
- Generate the random noise of size equal to batch size and train the generator network using the combined model.
- Repeat steps from 1 to 3 for some number of iterations. Here I have used 30000 iterations.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
def train(self): train_data = (self.x_train.astype(np.float32) - 127.5) / 127.5 train_data = np.expand_dims(train_data, -1) for i in range(self.iterations): batch_indx = np.random.randint(0, train_data.shape[0], size = (self.half_batch_size)) batch_x = train_data[batch_indx] input_noise = np.random.normal(0, 1, size=(self.half_batch_size, 100)) gen_outs = self.generator_model.predict(input_noise) fake_loss = self.discriminator_model.train_on_batch(gen_outs, np.zeros((self.half_batch_size,1))) real_loss = self.discriminator_model.train_on_batch(batch_x, np.ones((self.half_batch_size,1))) disc_loss = 0.5*np.add(fake_loss,real_loss) full_batch_input_noise = np.random.normal(0, 1, size=(self.batch_size, 100)) gan_loss = self.combined_model.train_on_batch(full_batch_input_noise, np.array([1] * self.batch_size)) print(i, disc_loss, gan_loss) |
Generating the new images from trained generator network
Now our model has been trained and we can discard the discriminator network and use the generator network to generate the new images. We will take random noise as input and generate the images. After generating the images we need to rescale them to show the outputs. Here is the code.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
# generating new images from trained network import matplotlib.pyplot as plt r, c = 5, 5 noise = np.random.normal(0, 1, (r * c, 100)) gen_imgs = gan.generator_model.predict(noise) # Rescale images 0 - 1 gen_imgs = 0.5 * gen_imgs + 0.5 fig, axs = plt.subplots(r, c) cnt = 0 for i in range(r): for j in range(c): axs[i,j].imshow(gen_imgs[cnt, :,:,0], cmap='gray') axs[i,j].axis('off') cnt += 1 plt.show() fig.savefig("mnist.png") plt.close() |
So, this was the implementation of DCGAN using MNIST dataset. In the next blogs we will learn other GAN variants.
Hope you enjoy reading.
If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.