Tag Archives: CycleGAN

Implementation of CycleGAN for Image-to-image Translation

CycleGAN is a variant of a generative adversarial network and was introduced to perform image translation from domain X to domain Y without using a paired set of training examples. In the previous blog, I have already described CycleGAN in detail. In this blog, we will implement CycleGAN to translate apple images to orange images and vice-versa with the help of Keras library. Here are some recommended blogs that you should refer before implementing CycleGAN:

  1. Cycle-Consistent Generative Adversarial Networks (CycleGAN)
  2. Image to Image Translation Using Conditional GAN
  3. Implementation of Image-to-image translation using conditional GAN

Load the Dataset And Preprocess

CycleGAN does not require any paired dataset as compared to other image translation algorithms. Hence here we will use two sets of datasets. One consists of apple images and the other consists of orange images. Both the datasets are not paired with each other. Here are some images from the dataset:

You can download the dataset from this link. Or run the following command from your terminal.

Dataset consists of four folders: trainA, trainB, testA, and testB. ‘A’ dataset consists of apple images and the ‘B’ dataset consist of orange images. Training set consists of approx 1000 images for each type and the test set consists of approx 200 images corresponding to each type.

So, let’s first import all the required libraries:

Dataset is a little preprocessed as it contains all images of equal size (256, 256, 3). Other preprocessing steps that we are going to use are normalization and random flipping. Here we are normalizing every image between -1 to 1 and randomly flipping horizontally. Here is the code:

Now load the training images from the directory into a list.

Build the Generator

The network architecture that I have used is very similar to the architecture used in image-to-image translation with conditional GAN. The major difference is the loss function. In CycleGAN two more losses have been introduced. One is cycle consistency loss and the other is identity loss.

Here generator network is a U-net architecture. This U-net architecture consists of the encoder-decoder model with a skip connection between encoder and decoder. Here we will use two generator networks. One will translate from apple to orange (G: X -> Y) and the other will translate from orange to apple (F: Y -> X). Each generator network is consists of encoder and decoder. Each encoder block is consist of three layers (Conv -> BatchNorm -> Leakyrelu). And each block in decoder network is consist of four layers (Transposed Conv -> BatchNorm -> Dropout -> Relu). The generator will take an image as input and outputs a generated image. Both images will have a size of (256, 256, 3). Here is the code:

Build the Discriminator

Discriminator network is a patchGAN pretty similar to the one used in the code for image-to-image translation with conditional GAN. Here two discriminators will be used. One discriminator will discriminate between images generated by generator A and orange images. And another discriminator is used to discriminate between image generated by generator B and apple images.

This patchGAN is nothing but a convolution network. The difference between patchGAN and normal convolution network is that instead of producing output as single scalar vector it generates an NxN array. This NxN array maps to the patch from the input images. And then takes an average to classify the whole image as real or fake.

Combined Network

Now we will create a combined network to train the generator model. Here both discriminators will be non-trainable. To train the generator network we will also use cycle consistency loss and identity loss.

Cycle consistency says that if we translate an English sentence to a french sentence and then translate back it to English sentence we should arrive at the original sentence. To calculate the cycle consistency loss first pass the input image A to generator A and then pass the predicted output to the generator B. Now calculate the loss between image generated from generator B and input image B. Same goes while taking image B as input to the generator B.

In case of identity loss, If we are passing image from domain A to generator A and trying to generate image looking similar to image from domain B then identity loss makes sure that even if we pass image from domain B to generator A it should generate image from domain B. Here is the code for combined model.

Loss, Optimizer and Compile the Models

Here we are using mse loss for the discriminator networks and mae loss for the generator network. Optimizer use here is Adam. The batch size for the network is 1 and the total number of epochs is 200.

Train the Network

  1. Generate image from generator A using image from domain A, Similarly generate an image from generator B using image from domain B.
  2. Train discriminator A on batch using images from domain A and images generated from generator B as real and fake image respectively.
  3. Train discriminator B on batch using images from domain B and images generated from generator A as real and fake image respectively.
  4. Train generator on batch using the combined model.
  5. Repeat steps from 1 to 4 for every image in the training dataset and then repeat this process for 200 epochs.

Hope you enjoy reading.

If you have any doubts/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Cycle-Consistent Generative Adversarial Networks (CycleGAN)

In this blog, we will learn how to perform an image-to-image translation using CycleGAN. The image-to-image translation is a type of computer vision problem where the image is transformed from one domain to another domain. Let say edges to a photo.

An image-to-image translation generally requires a paired set of images to train a model. We can see this type of translation using conditional GANs. In those cases paired set of images is required. Take a look into paired set of images for translating edges to photo:

But for many cases, collecting paired set of training data is quite difficult. Let say we want an object transfiguration model where we want to translate an image of a horse to an image of zebra and vice versa.

For these types of tasks, even the desired output is not well defined then how we can collect a paired set of images. To solve this problem authors have proposed an approach called CycleGAN to transfer an image from X domain to Y domain without paired set of examples.

Cycle Consistent GAN

A CycleGAN captures special characteristics of one image domain and figures out how these image characteristics could be translated to another image domain, all without paired training examples. Let’s look at some unpaired training dataset.

Problem with these translations: In the case of paired training examples, the network has supervision power with corresponding label images. But in the case of the unpaired training dataset, we need to supervise at a set level where sets are X domain and Y domain. Now to train such network we need to find a mapping G: X → Y such that outputs from G(X) are indistinguishable from the Y domain. The possibility of such G mappings is infinite which does not guarantee meaningful input and output image pairs. Sometimes this type of network causes mode collapse. Mode collapse occurs when all input images map to the same output image.

Cycle Consistent: To cop up with the problem stated above the authors of the paper proposed that translation should be “Cycle Consistent”. For example, if we translate an English sentence to a french sentence and then translate back it to English sentence we should arrive at the original sentence. Similarly, in case of image if we translate image from X domain to Y domain using a mapping G and then again translate this G(X) to X using mapping F we should arrive back at the same image.

So here, CycleGAN consists of two GAN network. Both of which have a generator and a discriminator network. To train the network it has two adversarial losses and one cycle consistency loss. Let’s see its mathematical formulation.

Mathematical Formulation of CycleGAN

Let say we are having two image domains X and Y. Now our model includes two mappings G: X → Y and F: Y → X. And we are having two adversarial losses DX and DY. DX will discriminate between F(Y) and X domain images. Similarly, DY will discriminate between G(X) and Y domain images. We will also have a cycle consistency loss to prevent a contradiction between learned mapping G and F.

In above figure (a), you can see the two different mappings G and F. Also figure (b) and (c) defines the forward cycle consistency loss ( x → G(x) → F(G(x)) ≈ x ) and backward consistency loss ( y → F(y) → G(F(y)) ≈ y ) respectively.

Network Architecture

There are two different architectures each for generator and discriminator network.

Generator network follows encoder-decoder architecture with three main parts:

  1. Encoder
  2. Transformer
  3. Decoder
Source

The encoder consists of three convolutional layers. An input image is passed through this encoder network and features volumes are taken as output. The transformer consists of 6 residual blocks. It takes feature volumes generated from the encoder layer as input and gives the output. And finally, the decoder layer which works as deconvolutional layers. It takes output from the transformer and generates a new image.

A Discriminator network is a simple network. It takes image as input and predicts whether it is part of real dataset or fake generated image dataset.

Source

This discriminator network is basically a patchGAN. A patchGAN is a simple convolutional network whereas the only difference is instead of mapping the input image to single scalar output, it maps input image to an NxN array output. Every individual in NxN output maps to a patch in the input image. In cycleGAN, it maps to 70×70 patches of the image. Finally, we take the mean of this output and optimize it to find the real of fake image. The advantage of using a patchGAN over a normal GAN discriminator is, it has fewer parameters than normal discriminator also it can work with arbitrary sized images.

Loss Function

Adversarial loss is applied to both mapping G and F with adversarial losses as DX and DY. These discriminator losses makes sure that the model is trained to generate data indistinguishable from real data for both image domains.

Adversarial losses alone can not guarantee that learned function can map individual input x to desired output y. Thus we need to use cycle consistency loss also. Cycle consistency loss makes sure that the image translation cycle is able to bring back x to the original image, i.e., x → G(x) → F(G(x)) ≈ x. Now full loss can be written as follows:

L(G, F, DX, DY ) =LGAN(G, DY , X, Y ) + LGAN(F, DX, Y, X) + λLcyc(G, F)

First, two arguments in the loss function are adversarial losses for both mappings. The last parameter is for cycle consistency loss. λ here defines the importance of the respective loss. Originally authors have used it as 10.

CycleGAN has produced compelling results in many cases but it also has some limitations. That’s all for CycleGAN introduction. In the next blog we will implement this algorithm in keras.

Hope you enjoy reading.

If you have any doubts/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Referenced Research Paper: Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks