Semi-supervised learning aims to make use of a large amount of unlabelled data to boost the performance of a model having less amount of labeled data. These type of models can be very useful when collecting labeled data is quite cumbersome and expensive. Several semi-supervised deep learning models have performed quite well on standard benchmarks. In this blog, we will learn how GANs can help in semi-supervised learning.
If you are new to GANs, you should first read this blog: An Introduction to Generative Adversarial Networks. Generally in GANs, we train using two networks adversely, generator and discriminator. After training the GAN network we discard the discriminator and only use generator network to generate the new data. Now in the semi-supervised model after training the network we will discard the generator model and use the discriminator model. But here the discriminator model is designed differently.
In semi-supervised GAN (SGAN) discriminator is not only trained to discriminate between real and fake data but also to predict the label for the input image. Let say we take an example of MNIST dataset. In MNIST dataset there are basically handwritten digits from 0-9, a total of 10 classes. Now in semi-supervised GAN for MNIST digits, the discriminator will be trained for real or fake images and for predicting these 10 classes also.
So in SGANs, the discriminator is trained with these three types of datasets.
- Fake images generated by generator network.
- Real images from a dataset without having any labels (a large amount of unlabeled data).
- Real images from the dataset with labels ( less number of the labeled dataset)
While generator in SGAN will be trained in a similar way as it is trained in vanilla GANs. This type of training will allow the model to learn useful features extracted from unlabeled dataset and use these features to train a supervised discriminator to predict the labels of the input image.
Implementing Semi-Supervised GAN
Now we will implement a semi-supervised GAN using MNIST digits dataset. If you want to implement a simple GAN you can follow this blog: Implementation of GANs to generated Handwritten Digits.
MNIST digits dataset consists of 60000 training images from which we will only use 1000 labeled images and rest as unlabeled images. We will select random 1000 labeled images containing 100 images for each class. Let’s see the code for this:
1 2 3 4 5 6 7 8 9 10 11 |
def sample_1000(self, x, y): x_1000 = [] y_1000 = [] for i in range(10): x_i = x[y==i] ix = np.random.randint(0, len(x_i), 100) [x_1000.append(x_i[j]) for j in ix] [y_1000.append(i) for j in ix] return x_1000, y_1000 |
Discriminator in SGAN
For this semi-supervised GAN model, we will create two discriminator models both of them share weights of every layer but have different output layers. One model will be the binary classifier model (discriminate between real and fake images) and another will be multi-class classifier model (predicts labels for the input image). Let’s see the code for this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
def discriminator(self): input_disc = Input(shape = (784,)) hidden1 = Dense(512, activation = 'relu')(input_disc) hidden2 = Dense(256, activation = 'relu')(hidden1) hidden3 = Dense(128, activation = 'relu')(hidden2) output = Dense(1, activation = 'sigmoid')(hidden3) output2 = Dense(10, activation = 'softmax', name = 'classification_layer')(hidden3) disc_model = Model(input_disc, output) disc_model_2 = Model(input_disc, output2) disc_model.compile(loss=['binary_crossentropy'], optimizer=self.optimizer, metrics=['accuracy']) disc_model_2.compile(loss=['categorical_crossentropy'], optimizer=self.optimizer, metrics=['accuracy']) print(disc_model.summary()) print(disc_model_2.summary()) return disc_model, disc_model_2 |
Generator in SGAN
Generator in this SGAN is a simple multi-layer neural network having three hidden layers with units 512, 256 and 128. The output layer is having a shape of the original image (28, 28,1). Input to the generator will we random noise of vector size 100. Here is the code.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
def generator(self): input_gen = Input(shape = (self.latent_dim,)) hidden1 = BatchNormalization(momentum=0.8)(Dense(256, activation = 'relu')(input_gen)) hidden2 = BatchNormalization(momentum=0.8)(Dense(512, activation = 'relu')(hidden1)) hidden3 = BatchNormalization(momentum=0.8)(Dense(1024, activation = 'relu')(hidden2)) output = Dense(784, activation='tanh')(hidden3) reshaped_output = Reshape((28, 28, 1))(output) gen_model = Model(input_gen, reshaped_output) gen_model.compile(loss='binary_crossentropy', optimizer=self.optimizer) print(gen_model.summary()) return gen_model |
Training the model
Training this model will consist of the following steps:
- Sample both label and unlabeled data from the MNIST dataset, also normalize and make labels of data into categorical form.
- Train the multi-class discriminator model with labeled real images (take a batch from images)
- Train the binary-class discriminator model with unlabeled real images (take a batch from images)
- Sample noise of vector size 100 and train the binary-class discriminator model with fake images generated by generator network.
- Sample noise of vector size 100 and train the combined model to train the generator network.
- Repeat steps from 2-5 for some number of iterations. I have trained it for 10000 iterations.
In the above training steps, you can see that we are training multi-class discriminator and binary-class discriminator in different steps. But actually they are sharing weights of the same network except for the output layer (As I have mentioned earlier).
Also, Binary-class discriminator is trained two times in every iteration, one with real images taken from the dataset and another with fake images generated from the generative network. While multi-class discriminator is trained once in each iteration, only with real labeled images. This is because multi-class labels are not available for generated images.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
def train(self): train_data, train_data_y = self.sample_1000(self.x_train, self.y_train) train_data = ((np.array(train_data).astype(np.float32))-127.5)/127.5 train_data_y = to_categorical(train_data_y) all_train_data = ((np.array(self.x_train).astype(np.float32))-127.5)/127.5 all_train_data_y = to_categorical(self.y_train) for j in range(self.iterations): batch_indx = np.random.randint(0, train_data.shape[0], size = (self.half_batch_size)) batch_x = train_data[batch_indx] batch_x = batch_x.reshape((-1, 784)) batch_y = train_data_y[batch_indx] batch_indx_total = np.random.randint(0, all_train_data.shape[0], size = (self.half_batch_size)) batch_x_total = all_train_data[batch_indx_total] batch_x_total = batch_x_total.reshape((-1, 784)) batch_y_total = all_train_data_y[batch_indx_total] input_noise = np.random.normal(0, 1, size=(self.half_batch_size, 100)) gen_outs = self.generator_model.predict(input_noise) gen_outs = gen_outs.reshape((-1, 784)) classi_loss = self.classification_model.train_on_batch(batch_x, batch_y) real_loss1 = self.discriminator_model.train_on_batch(batch_x_total, np.ones((self.half_batch_size,1))) fake_loss = self.discriminator_model.train_on_batch(gen_outs, np.zeros((self.half_batch_size,1))) full_batch_input_noise = np.random.normal(0, 1, size=(self.batch_size, 100)) gan_loss = self.combined_model.train_on_batch(full_batch_input_noise, np.array([1] * self.batch_size)) if j%1000 == 0: test_data = ((self.x_test.astype(np.float32)-127.5)/127.5).reshape((-1, 784)) test_results = self.classification_model.predict(test_data) test_results_argmax = np.argmax(test_results, axis = 1) count = 0 for i in range(len(test_results_argmax)): if test_results_argmax[i] == self.y_test[i]: count += 1 print("Accuracy After", j,"iterations: ", (count/len(test_data))*100) |
I have also tested the SGAN model with 10000 test dataset provided by MNIST after every 1000 iteration. Here is the result of that.
Now you can see that I have trained this SGAN model with only 1000 labeled images and it gives an accuracy of about 94.8%, that is quite nice.
Give me the full code!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 |
from keras.layers import Input, Dense, Reshape, BatchNormalization from keras.models import Model from keras.optimizers import Adam from keras.datasets import mnist from keras.utils import to_categorical import numpy as np class GAN(): def __init__(self): (self.x_train, self.y_train), (self.x_test, self.y_test) = mnist.load_data() self.batch_size = 100 self.half_batch_size = 50 self.latent_dim = 100 self.iterations = 10000 self.optimizer = Adam(0.0002, 0.5) self.generator_model = self.generator() self.discriminator_model, self.classification_model = self.discriminator() self.combined_model = self.combined() def generator(self): input_gen = Input(shape = (self.latent_dim,)) hidden1 = BatchNormalization(momentum=0.8)(Dense(256, activation = 'relu')(input_gen)) hidden2 = BatchNormalization(momentum=0.8)(Dense(512, activation = 'relu')(hidden1)) hidden3 = BatchNormalization(momentum=0.8)(Dense(1024, activation = 'relu')(hidden2)) output = Dense(784, activation='tanh')(hidden3) reshaped_output = Reshape((28, 28, 1))(output) gen_model = Model(input_gen, reshaped_output) gen_model.compile(loss='binary_crossentropy', optimizer=self.optimizer) print(gen_model.summary()) return gen_model def discriminator(self): input_disc = Input(shape = (784,)) hidden1 = Dense(512, activation = 'relu')(input_disc) hidden2 = Dense(256, activation = 'relu')(hidden1) hidden3 = Dense(128, activation = 'relu')(hidden2) output = Dense(1, activation = 'sigmoid')(hidden3) output2 = Dense(10, activation = 'softmax', name = 'classification_layer')(hidden3) disc_model = Model(input_disc, output) disc_model_2 = Model(input_disc, output2) disc_model.compile(loss=['binary_crossentropy'], optimizer=self.optimizer, metrics=['accuracy']) disc_model_2.compile(loss=['categorical_crossentropy'], optimizer=self.optimizer, metrics=['accuracy']) print(disc_model.summary()) print(disc_model_2.summary()) return disc_model, disc_model_2 def combined(self): inputs = Input(shape = (self.latent_dim,)) gen_img = self.generator_model(inputs) gen_img = Reshape((784,))(gen_img) self.discriminator_model.trainable = False outs = self.discriminator_model(gen_img) comb_model = Model(inputs, outs) comb_model.compile(loss='binary_crossentropy', optimizer=self.optimizer, metrics=['accuracy']) print(comb_model.summary()) return comb_model def sample_1000(self, x, y): x_1000 = [] y_1000 = [] for i in range(10): x_i = x[y==i] ix = np.random.randint(0, len(x_i), 100) [x_1000.append(x_i[j]) for j in ix] [y_1000.append(i) for j in ix] return x_1000, y_1000 def train(self): train_data, train_data_y = self.sample_1000(self.x_train, self.y_train) train_data = ((np.array(train_data).astype(np.float32))-127.5)/127.5 train_data_y = to_categorical(train_data_y) all_train_data = ((np.array(self.x_train).astype(np.float32))-127.5)/127.5 all_train_data_y = to_categorical(self.y_train) for j in range(self.iterations): batch_indx = np.random.randint(0, train_data.shape[0], size = (self.half_batch_size)) batch_x = train_data[batch_indx] batch_x = batch_x.reshape((-1, 784)) batch_y = train_data_y[batch_indx] batch_indx_total = np.random.randint(0, all_train_data.shape[0], size = (self.half_batch_size)) batch_x_total = all_train_data[batch_indx_total] batch_x_total = batch_x_total.reshape((-1, 784)) batch_y_total = all_train_data_y[batch_indx_total] input_noise = np.random.normal(0, 1, size=(self.half_batch_size, 100)) gen_outs = self.generator_model.predict(input_noise) gen_outs = gen_outs.reshape((-1, 784)) classi_loss = self.classification_model.train_on_batch(batch_x, batch_y) real_loss1 = self.discriminator_model.train_on_batch(batch_x_total, np.ones((self.half_batch_size,1))) fake_loss = self.discriminator_model.train_on_batch(gen_outs, np.zeros((self.half_batch_size,1))) full_batch_input_noise = np.random.normal(0, 1, size=(self.batch_size, 100)) gan_loss = self.combined_model.train_on_batch(full_batch_input_noise, np.array([1] * self.batch_size)) if j%1000 == 0: test_data = ((self.x_test.astype(np.float32)-127.5)/127.5).reshape((-1, 784)) test_results = self.classification_model.predict(test_data) test_results_argmax = np.argmax(test_results, axis = 1) count = 0 for i in range(len(test_results_argmax)): if test_results_argmax[i] == self.y_test[i]: count += 1 print("Accuracy After", j,"iterations: ", (count/len(test_data))*100) gan = GAN() gan.train() |
Hope you enjoy reading.
If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.