Tag Archives: image translation

Implementation of CycleGAN for Image-to-image Translation

CycleGAN is a variant of a generative adversarial network and was introduced to perform image translation from domain X to domain Y without using a paired set of training examples. In the previous blog, I have already described CycleGAN in detail. In this blog, we will implement CycleGAN to translate apple images to orange images and vice-versa with the help of Keras library. Here are some recommended blogs that you should refer before implementing CycleGAN:

  1. Cycle-Consistent Generative Adversarial Networks (CycleGAN)
  2. Image to Image Translation Using Conditional GAN
  3. Implementation of Image-to-image translation using conditional GAN

Load the Dataset And Preprocess

CycleGAN does not require any paired dataset as compared to other image translation algorithms. Hence here we will use two sets of datasets. One consists of apple images and the other consists of orange images. Both the datasets are not paired with each other. Here are some images from the dataset:

You can download the dataset from this link. Or run the following command from your terminal.

Dataset consists of four folders: trainA, trainB, testA, and testB. ‘A’ dataset consists of apple images and the ‘B’ dataset consist of orange images. Training set consists of approx 1000 images for each type and the test set consists of approx 200 images corresponding to each type.

So, let’s first import all the required libraries:

Dataset is a little preprocessed as it contains all images of equal size (256, 256, 3). Other preprocessing steps that we are going to use are normalization and random flipping. Here we are normalizing every image between -1 to 1 and randomly flipping horizontally. Here is the code:

Now load the training images from the directory into a list.

Build the Generator

The network architecture that I have used is very similar to the architecture used in image-to-image translation with conditional GAN. The major difference is the loss function. In CycleGAN two more losses have been introduced. One is cycle consistency loss and the other is identity loss.

Here generator network is a U-net architecture. This U-net architecture consists of the encoder-decoder model with a skip connection between encoder and decoder. Here we will use two generator networks. One will translate from apple to orange (G: X -> Y) and the other will translate from orange to apple (F: Y -> X). Each generator network is consists of encoder and decoder. Each encoder block is consist of three layers (Conv -> BatchNorm -> Leakyrelu). And each block in decoder network is consist of four layers (Transposed Conv -> BatchNorm -> Dropout -> Relu). The generator will take an image as input and outputs a generated image. Both images will have a size of (256, 256, 3). Here is the code:

Build the Discriminator

Discriminator network is a patchGAN pretty similar to the one used in the code for image-to-image translation with conditional GAN. Here two discriminators will be used. One discriminator will discriminate between images generated by generator A and orange images. And another discriminator is used to discriminate between image generated by generator B and apple images.

This patchGAN is nothing but a convolution network. The difference between patchGAN and normal convolution network is that instead of producing output as single scalar vector it generates an NxN array. This NxN array maps to the patch from the input images. And then takes an average to classify the whole image as real or fake.

Combined Network

Now we will create a combined network to train the generator model. Here both discriminators will be non-trainable. To train the generator network we will also use cycle consistency loss and identity loss.

Cycle consistency says that if we translate an English sentence to a french sentence and then translate back it to English sentence we should arrive at the original sentence. To calculate the cycle consistency loss first pass the input image A to generator A and then pass the predicted output to the generator B. Now calculate the loss between image generated from generator B and input image B. Same goes while taking image B as input to the generator B.

In case of identity loss, If we are passing image from domain A to generator A and trying to generate image looking similar to image from domain B then identity loss makes sure that even if we pass image from domain B to generator A it should generate image from domain B. Here is the code for combined model.

Loss, Optimizer and Compile the Models

Here we are using mse loss for the discriminator networks and mae loss for the generator network. Optimizer use here is Adam. The batch size for the network is 1 and the total number of epochs is 200.

Train the Network

  1. Generate image from generator A using image from domain A, Similarly generate an image from generator B using image from domain B.
  2. Train discriminator A on batch using images from domain A and images generated from generator B as real and fake image respectively.
  3. Train discriminator B on batch using images from domain B and images generated from generator A as real and fake image respectively.
  4. Train generator on batch using the combined model.
  5. Repeat steps from 1 to 4 for every image in the training dataset and then repeat this process for 200 epochs.

Hope you enjoy reading.

If you have any doubts/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Implementation of Image-to-image translation using conditional GAN

In the previous blog, we have learned what is an image-to-image translation. Also, we discussed how it can be performed using conditional GAN. Conditional GAN is a type of generative adversarial network where discriminator and generator networks are conditioned on some sort of auxiliary information. In image-to-image translation using conditional GAN, we take an image as a piece of auxiliary information. With the help of this information, the generator tries to generate a new image. Let’s say we want to translate the edge image of a shoe to a real looking image of a shoe. Here we can condition our GAN with the edge image.

To know more about conditional GAN and its implementation from scratch, you can read these blog:

  1. Conditional Generative Adversarial Networks (CGAN): Introduction and Implementation
  2. Image to Image Translation Using Conditional GAN

Next, in this blog, we will implement image-to-image translation from scratch using Keras functional API.

Dataset and Preprocessing

To implement an image-to-image translation model using conditional GAN, we need a paired dataset as shown in the below image.

Center for Machine Perception (CMP) at the Czech Technical University in Prague provides rich source of the paired dataset for image-to-image translation which we can use here for our model. In this blog, we will use edges to shoe dataset provided by this link. This dataset consists of a train and validation set. The training set is consist of 49825 images and validation set is consist of 200 images. This dataset consist of some preprocessed images which contains edge and shoe in a single image as shown below:

These images have the size of (256, 512, 3) where 256 is the height, 512 is the width and the number of channels is 3. Now to bifurcate this image into input and output image, we can just slice this image from mid. After segregating we also need to normalize the image. These images consist of values b/w 0 to 255 and to make training faster and reducing the chances of getting stuck in local minima we need to normalize these images. we will normalize these images between -1 to 1. Here is the code to preprocess the image.

In the preprocessing step we have only used the normalization technique. To preprocess the images we can also do some random jittering and random mirroring as mentioned in the paper. To perform random jittering you just need to upscale the image to 286×286 and then randomly crop to 256×256. To perform random mirroring you need to flip the image horizontally.

Generator Network

Generator network for this conditional GAN architecture is a modified U-net architecture. This U-net architecture consists of an encoder-decoder network with skip connections between encoder and decoder. Each encoder block is consist of three layers (Conv -> BatchNorm -> Leakyrelu). Downsampling in the encoder layer is performed using the strided convolutional layers. Each block in decoder network is consist of four layers (Transposed Conv -> BatchNorm -> Dropout -> Relu). Dropout is only applied for the first three blocks in the decoder network. The input shape for the network is (256, 256, 3). Output shape is also (256, 256, 3) which will be a generated image.

Normally in a generative adversarial network, input to a generator is a noise vector. But here we will use a combination of noise vector and edge image as input to the generator. We will take a noise vector of size 100 and then use a dense layer and then reshape it to concatenate with image input. Here is the code for the generator network. The model looks a little lengthy but don’t worry these are just repeated U-net blocks for encoder and decoder.

Discriminator Network

Here discriminator is a patchGAN. A patchGAN is basically a convolutional network where the input image is mapped to an NxN array instead of a single scalar vector. For this conditional GAN, the discriminator takes two inputs. One is edge image and the other is the shoe image. Both inputs are of shape 9256, 256, 3). The output shape of this network is (30, 30, 1). Here each 30×30 output patch classifies the 70×70 portion of the input image.

Here each block in the discriminator is consist of 3 layers (Conv -> BatchNorm -> LeakyRelu). I have used the Gaussian Blurring layer to reduce the dominance of discriminator while training. Here is the full code.

Combined Network

Now we will create a combined network to train the generator model. Firstly this network takes noise vector and edge image as input and generates a new image using a generator network. Now the output from the generator network and edge image is fed to the discriminator network to get the output. But here discriminator will be non-trainable. Here is the network code.

Training

I have used binary cross-entropy loss for the discriminator network. For the generator network, I have coupled the binary cross-entropy loss with mae loss. This is because, for image-to-image translation, the generator’s duty is not only to fool the discriminator but also to generate real-looking images. I have used Adam optimizer for both generator and discriminator but the only difference is that I have kept a low learning rate for the discriminator to make it less dominant while training. I have used a batch size of 1. Here are the steps to train the explained conditional GAN.

  1. Train the discriminator model with real output images with patch labels of values 1.
  2. Train the discriminator model with images generated from a generator with patch labels of values 0.
  3. Train the generator network using the combined model.
  4. Repeat the steps from 1 to 3 for each image in the training dataset and then repeat all this for some number of epochs.

Hope you enjoy reading.

If you have any doubts/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.