Variational autoencoders are an extension of autoencoders and used as generative models. You can generate data like text, images and even music with the help of variational autoencoders.
Autoencoders are the neural network used to reconstruct original input. To know more about autoencoders please got through this blog. They have a certain application like denoising autoencoders and dimensionality reduction for data visualization. But apart from that, they are fairly limited.
To overcome this limitation, variational autoencoders comes into place. A common autoencoder learns a function which does not train autoencoder to generate images from a particular distribution. Also, if you try to create a generative model using autoencoders, you do not want to generate data as therein input. You want the output data with some variations which mostly look like input data.
Variational Autoencoder Model
A variational autoencoder has encoder and decoder part mostly same as autoencoders, the difference is instead of creating a compact distribution from its encoder, it learns a latent variable model. These latent variables are used to create a probability distribution from which input for the decoder is generated. Another is, instead of using mean squared or cross entropy loss function (as in autoencoders ) it has its own loss function.
I will not go further into the mathematics behind it, Lets jump into the code which will give more understanding about variational autoencoders. To know more about the mathematics behind it please go through this tutorial.
I have implemented variational autoencoder in keras using MNIST dataset. So lets first download the data.
|
# download training and test data from mnist and reshape it from keras.datasets import mnist (X_train, Y_train), (X_test, Y_test) = mnist.load_data() X_train = X_train.astype('float32') / 255. X_train = X_train.reshape(-1,28,28,1) X_test = X_test.astype('float32') / 255. X_test = X_test.reshape(-1,28,28,1) print(X_train.shape, X_test.shape) |
Now create an encoder model as it is created in autoencoders.
|
# Create encoder network inputs = Input(shape = (28,28,1)) conv1 = Conv2D(16, (3,3), activation = 'relu', padding = "SAME")(inputs) conv1_1 = Conv2D(16, (3,3), activation = 'relu', padding = "SAME")(conv1) pool1 = MaxPooling2D(pool_size = (2,2), strides = 2)(conv1_1) conv2 = Conv2D(32, (3,3), activation = 'relu', padding = "SAME")(pool1) pool2 = MaxPooling2D(pool_size = (2,2), strides = 2)(conv2) flat = Flatten()(pool2) input_to_z = Dense(32, activation = 'relu')(flat) |
Latent Distribution Parameters and Function
Now encode the output of the encoder to latent distribution parameters. Here, I have created two parameters mu and sigma which represents the mean and standard distribution of the distribution.
|
latent_dim = 2 # dimension of latent variable mu = Dense(latent_dim, name='mu')(input_to_z) sigma = Dense(latent_dim, name='log_var')(input_to_z) encoder = Model(inputs, mu) |
Here I have taken latent space dimension equal to 2. This is the bottleneck which means we are passing our entire set of data to two single variables. So if we increase our latent space dimension to 5, 10 or higher, we can get better results in the output. But this will create more data in the bottleneck.
Now create a Gaussian distribution function with mean zero and standard deviation of 1. This distribution will give variation in the input to the decoder, which will help to get variation in the output. Then decoder will predict the output using distribution.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
|
# create latent distribution function and generate vectors def sampling(args): mu, sigma = args epsilon = K.random_normal(shape=(K.shape(mu)[0], latent_dim), mean=0., stddev=1.) return mu + K.exp(sigma) * epsilon z = Lambda(sampling)([mu, sigma]) #create decoder network which is reverse of encoder decoder_inputs = Input(K.int_shape(z)[1:]) dense_layer_d = Dense(7*7*32, activation = 'relu')(decoder_inputs) output_from_z_d = Reshape((7,7,32))(dense_layer_d) trans1_d = Conv2DTranspose(32, 3, padding='same', activation='relu', strides=(2, 2))(output_from_z_d) trans1_1_d = Conv2DTranspose(16, 3, padding='same', activation='relu', strides=(2, 2))(trans1_d) trans2_d = Conv2DTranspose(1, 3, padding='same', activation='relu')(trans1_1_d) decoder = Model(decoder_inputs, trans2_d) z_decoded = decoder(z) |
Loss Function
For the loss function, a variational autoencoder uses the sum of two losses, one is the generative loss which is a binary cross entropy loss and measures how accurately the image is predicted, another is the latent loss, which is KL divergence loss, measures how closely a latent variable match Gaussian distribution. This KL divergence makes sure that our distribution generated from encoder do not go away from the origin. Then train the model.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
|
#calculate reconstruction loss and KL divergence class calc_output_with_los(keras.layers.Layer): def vae_loss(self, x, z_decoded): x = K.flatten(x) z_decoded = K.flatten(z_decoded) xent_loss = keras.metrics.binary_crossentropy(x, z_decoded) kl_loss = -5e-4 * K.mean(1 + sigma - K.square(mu) - K.exp(sigma), axis=-1) return K.mean(xent_loss + kl_loss) def call(self, inputs): x = inputs[0] z_decoded = inputs[1] loss = self.vae_loss(x, z_decoded) self.add_loss(loss, inputs=inputs) return x outputs = calc_output_with_los()([inputs, z_decoded]) # define variational autoencoder model and train it vae = Model(inputs, outputs) m = 256 n_epoch = 10 vae.compile(optimizer='adam', loss=None) vae.fit(X_train, epochs=n_epoch, batch_size=m, shuffle=True, validation_data=(X_test, None)) |
Our model is ready and we can generate images from it very easily. All we need to do is sample latent variable from distribution and pass it to the decoder. Lets test with the following code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
|
n = 15 # figure with 15x15 digits digit_size = 28 figure = np.zeros((digit_size * n, digit_size * n)) grid_x = np.linspace(-1, 1, n) grid_y = np.linspace(-1, 1, n) for i, yi in enumerate(grid_x): for j, xi in enumerate(grid_y): z_sample = np.array([[xi, yi]]) * 1. x_decoded = decoder.predict(z_sample) digit = x_decoded[0].reshape(digit_size, digit_size) figure[i * digit_size: (i + 1) * digit_size, j * digit_size: (j + 1) * digit_size] = digit plt.figure(figsize=(10, 10)) plt.imshow(figure) plt.show() |
Here is the output generated from sampled distribution in the above code.
The full code can be find here.
Hope you understand the basics of variational autoencoders. Hope you enjoy reading.
If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.