Tag Archives: generative adversarial network

GAN to Generate Images of Climate Change

Generative adversarial networks (GANs) are deep learning models that are used to generate images similar to real images. Images generated by GANs are both realistic and personalized. But to generate images of high quality, the network requires a huge amount of data. Their usability limits in case of a low quantity of data. In this blog, we will discuss how we can use simulated data to generate images of climate change using GANs in case we are having a scarcity of training data.

Introduction

Recently researchers with the Montreal Institute for Learning Algorithms (MILA) used generative adversarial networks to generate images of the world after the flood. They tried to show how the world would change if some calamity like a flood occurs. They hope people would work to avert future weather conditions if they can see these changes. Researchers used simulated data in combination with real images to train multimodel unsupervised image-to-image translation with some modification to architecture.

Data Collection

Real Dataset

Researchers have collected 2000 real images of flooded and non-flooded scenes taken in various weather conditions, seasons, time and viewpoints. These images were taken from publicly available datasets Mapilary and Flickr. They used this dataset and trained CycleGAN but the generated images were not sufficiently realistic. To cop up with this problem they used simulated data.

Simulated Dataset

To generate simulated dataset researchers used the Unity 3D game engine. They created different types of building in combination with urban and rural environments. As a starting point, they generated 1000 unique pairs of images with flooded and non-flooded domains.

Domain Adaption Technique

While using simulated data, authors have seen domain gap between training dataset made up of simulated data and testing data made up of real images. To bridge this gap they used domain adaption technique inspired by unsupervised semantic segmentation. This technique is being implemented by using an adversarial classifier within MUNIT architecture.

Network Architecture

Researchers have tried different image-to-image translation GANs like CycleGAN, InstaGAN, and MUNIT. CycleGAN and InstaGAN were not able to generate as realistic water texture as MUNIT was able to. Finally, they used MUNIT architecture with some modifications.

MUNIT architecture relies on two generators and two discriminators to disentangle the style and content of the images. Such that during the generation of the image only style changes and content remains the same. To make MUNIT architecture more compatible with climate change use case, researchers have made the following changes to the architecture:

  1. Restriction of Cycle COnsistency Loss: In image-to-image translation GANs, cycle consistency loss is used to make sure that translation is cycle consistent. Let say, If we translate from English to French and then translate back to English sentence, we should arrive at the original sentence. In this architecture, researchers have restricted the network’s cycle consistency loss such that this loss is only computed on those regions that are not likely to be flooded. To do this they have used the binary masks of the areas.
  2. Introduction of semantic consistency loss: This loss confirms that the semantic segmentation structure for the generated image is the same as the source image except for the areas where changes occurred like the road to the flooded area.

This approach uses both real and simulated data to perform image-to-image translation to show the effects of climate change. This approach clearly shows that simulated data helps in generating more realistic images. Researchers are still working on to improve the results of this model. They are also working to create an interactive website.

“Authors aim to develop an interactive website that, given a user-entered address, will query the Google Street View API (Anguelov et al., 2010) to get an image of the location and alter it to display a plausible image of its climate future based on the predictions of climate models. We hope this tool will help communicate effectively on climate change related risks.

Referenced Research Paper: Using Simulated data to generate images of climate change

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Style Generative Adversarial Network (StyleGAN)

Generative adversarial network( GAN ) generates synthetic images that are indistinguishable from authentic images. A GAN network consists of a generator network and a discriminator network. Generator network tries to generate new images from a noise vector and discriminator network discriminate these generated images from the original dataset. While training the GAN model, the generator network tries to fool the discriminator and discriminator to improve itself to differentiate between real and fake images. This training will continue until the discriminator model is fooled half the time and the generator is not able to generate data similar to original data distribution.

Since the introduction of generative adversarial networks in 2014, there has been many improvements in its architecture. Deep convolutional GAN, semisupervised GAN, conditional GAN, CycleGAN and many more. These variants of GAN mainly focuses on improving the discriminator architecture and the generator model continues to operate as the black box.

The style generative adversarial network proposed an alternative generator architecture that can control the specific features of the output image such as pose, identity, hairs, freckles( when trained on face dataset ) even without compromising the image quality.

Baseline Architecture

The baseline architecture for StyleGAN is taken from another recently introduced GAN variant: Progressive GAN. In progressive GAN, both generator and discriminator grow progressively: starting from low resolution, It adds up layers to the model which can extract very fine details. In progressive GAN images start from 4×4 and generate images up to 1024×1024 size. This progressively growing architecture speeds up and stabilizes the training process which helps in generating such high-quality images.

StyleGAN Architecture

Progressive GAN was able to generate high-quality images but to control the specific features of the generated image was difficult with its architecture. To control the features of the output image some changes were made into Progressive GAN’s generator architecture and StyleGAN was created. Here is the architecture of the generator for the StyleGAN.

Along with the generator’s architecture, the above figure also differentiates between a traditional generator network and a Style-based generator network. To develop StyleGAN’s generator network, there are some modifications done in the progressive GAN. We will discuss these modifications one by one.

1. Removal of Traditional Input Layer

In traditional generator networks, a latent vector is provided through an input layer. This latent vector must follow the probability density of training data which ca leads to some degree of entanglement. Let’s say if training data consist of one type of image greater than other variations, then it can lead to producing images with features more related to that large type of data. So instead of a traditional input layer, the synthesis network( generator network) starts with a 4 × 4 × 512 constant tensor.

2. Mapping Network and AdaIN

Mapping network embeds the input latent code to intermediate latent space which can be used as style and incorporated at each block of synthesis network. As you can see in the above generator’s architecture, latent code is fed to 8 fully connected layers and an intermediate latent space W is generated.

This intermediate latent space W is passed through a convolutional layer “A” (shown in the architecture) and specializes in styles ( y = ( ys , yb )) to transform and incorporate into each block of the generator network. To incorporate this into each block of the generator network, first, the feature maps (xi) from each block are normalized separately and then scaled and biased using corresponding styles. This is also known as adaptive instance normalization (AdaIN).

This AdaIN operation is added to each block of the generator network which helps in deciding the features in the output layer.

3. Bilinear Upsampling

This generator network grows progressively. Usually upsampling in a generator network one uses transposed convolutional network. But here in StyleGAN, it uses bilinear upsampling to upsample the image instead of using the transposed convolution layer.

4. Noise Layers

As you can see in the architecture of the StyleGAN, noise layers are added after each block of the generator network( synthesis network ). This noise consists of uncorrelated Gaussian noise which is first broadcasted using a layer “B” to the shape of feature maps from each convolutional block. Using this addition of noise StyleGAN can add stochastic variations to the output.

There are many stochastic features in the human face like hairs, stubbles, freckles, or skin pores. In traditional generator, there was only a single source of noise vector to add these stochastic variations to the output which was not quite fruitful. But with adding noise at each block of synthesis network in the generator architecture make sure that it only affects stochastic aspects of the face.

5. Style Mixing

This is basically a regularization technique. During training, images are generated using two latent codes. It means two latent codes z1 and z2 are taken to produce w1 and w2 styles using a mapping network. In the Synthesis network a split point is selected and w1 style is applied up to that point and w2 style is applied after that point and the network is trained in this way.

In the synthesis network, these styles are added at each block. Due to this network can assume that these adjacent styles are correlated. But style mixing can prevent the network from assuming these adjacent styles are correlated.

Source

These were the basic changes made in baseline architecture to improve it and create a StyleGAN architecture. Other things like, generator architecture, mini-batch sizes, Adam hyperparameters and moving an exponential average of the generator are the same as baseline architecture.

Summary

StyleGAN has proven to be promising at producing high-quality realistic images also gives control to generate images with particular features. It was clearly seen that traditional generators lag far behind than this improved generator network. Concepts like mapping network and AdaIN can really be very helpful in GAN architecture and other research work.

Referenced Research Paper: 1. A Style-Based Generator Architecture for Generative Adversarial Networks 2. Progressive Growing of GANs for Improved Quality, Stability, and Variation

Hope you enjoy reading.

If you have any doubts/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Cycle-Consistent Generative Adversarial Networks (CycleGAN)

In this blog, we will learn how to perform an image-to-image translation using CycleGAN. The image-to-image translation is a type of computer vision problem where the image is transformed from one domain to another domain. Let say edges to a photo.

An image-to-image translation generally requires a paired set of images to train a model. We can see this type of translation using conditional GANs. In those cases paired set of images is required. Take a look into paired set of images for translating edges to photo:

But for many cases, collecting paired set of training data is quite difficult. Let say we want an object transfiguration model where we want to translate an image of a horse to an image of zebra and vice versa.

For these types of tasks, even the desired output is not well defined then how we can collect a paired set of images. To solve this problem authors have proposed an approach called CycleGAN to transfer an image from X domain to Y domain without paired set of examples.

Cycle Consistent GAN

A CycleGAN captures special characteristics of one image domain and figures out how these image characteristics could be translated to another image domain, all without paired training examples. Let’s look at some unpaired training dataset.

Problem with these translations: In the case of paired training examples, the network has supervision power with corresponding label images. But in the case of the unpaired training dataset, we need to supervise at a set level where sets are X domain and Y domain. Now to train such network we need to find a mapping G: X → Y such that outputs from G(X) are indistinguishable from the Y domain. The possibility of such G mappings is infinite which does not guarantee meaningful input and output image pairs. Sometimes this type of network causes mode collapse. Mode collapse occurs when all input images map to the same output image.

Cycle Consistent: To cop up with the problem stated above the authors of the paper proposed that translation should be “Cycle Consistent”. For example, if we translate an English sentence to a french sentence and then translate back it to English sentence we should arrive at the original sentence. Similarly, in case of image if we translate image from X domain to Y domain using a mapping G and then again translate this G(X) to X using mapping F we should arrive back at the same image.

So here, CycleGAN consists of two GAN network. Both of which have a generator and a discriminator network. To train the network it has two adversarial losses and one cycle consistency loss. Let’s see its mathematical formulation.

Mathematical Formulation of CycleGAN

Let say we are having two image domains X and Y. Now our model includes two mappings G: X → Y and F: Y → X. And we are having two adversarial losses DX and DY. DX will discriminate between F(Y) and X domain images. Similarly, DY will discriminate between G(X) and Y domain images. We will also have a cycle consistency loss to prevent a contradiction between learned mapping G and F.

In above figure (a), you can see the two different mappings G and F. Also figure (b) and (c) defines the forward cycle consistency loss ( x → G(x) → F(G(x)) ≈ x ) and backward consistency loss ( y → F(y) → G(F(y)) ≈ y ) respectively.

Network Architecture

There are two different architectures each for generator and discriminator network.

Generator network follows encoder-decoder architecture with three main parts:

  1. Encoder
  2. Transformer
  3. Decoder
Source

The encoder consists of three convolutional layers. An input image is passed through this encoder network and features volumes are taken as output. The transformer consists of 6 residual blocks. It takes feature volumes generated from the encoder layer as input and gives the output. And finally, the decoder layer which works as deconvolutional layers. It takes output from the transformer and generates a new image.

A Discriminator network is a simple network. It takes image as input and predicts whether it is part of real dataset or fake generated image dataset.

Source

This discriminator network is basically a patchGAN. A patchGAN is a simple convolutional network whereas the only difference is instead of mapping the input image to single scalar output, it maps input image to an NxN array output. Every individual in NxN output maps to a patch in the input image. In cycleGAN, it maps to 70×70 patches of the image. Finally, we take the mean of this output and optimize it to find the real of fake image. The advantage of using a patchGAN over a normal GAN discriminator is, it has fewer parameters than normal discriminator also it can work with arbitrary sized images.

Loss Function

Adversarial loss is applied to both mapping G and F with adversarial losses as DX and DY. These discriminator losses makes sure that the model is trained to generate data indistinguishable from real data for both image domains.

Adversarial losses alone can not guarantee that learned function can map individual input x to desired output y. Thus we need to use cycle consistency loss also. Cycle consistency loss makes sure that the image translation cycle is able to bring back x to the original image, i.e., x → G(x) → F(G(x)) ≈ x. Now full loss can be written as follows:

L(G, F, DX, DY ) =LGAN(G, DY , X, Y ) + LGAN(F, DX, Y, X) + λLcyc(G, F)

First, two arguments in the loss function are adversarial losses for both mappings. The last parameter is for cycle consistency loss. λ here defines the importance of the respective loss. Originally authors have used it as 10.

CycleGAN has produced compelling results in many cases but it also has some limitations. That’s all for CycleGAN introduction. In the next blog we will implement this algorithm in keras.

Hope you enjoy reading.

If you have any doubts/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Referenced Research Paper: Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks

Single Image Super-Resolution Using a Generative Adversarial Network

In recent years, the neural network has produced various breakthroughs in different areas. One of its promising results can be seen in super-resolving an image at large up-scaling factors as shown below

Isn’t it difficult to produce a high resolution image from a low resolution image?

In the paper, Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network, authors have used Generative Adversarial Network for super-resolution and are capable of producing photo-realistic natural images for 4x up-scaling factors.

In the paper, authors have used generative adversarial network (GAN) to produce single image super resolution from a low resolution image. In this blog we will see the followings:

  1. Architecture of GAN used in the paper.
  2. Loss function used for this problem.

Adversarial Network Architecture used in paper:

In the paper they have also used one discriminator and one generator model. Here, the generator is being fed with LR images and tries to generate images which are difficult to classify from real HR images by the discriminator.

Source

Generator Network: Input LR image is passed with 9*9 kernels with 64 filters and ParametricReLU. Then B residual blocks are being applied and each block is having 3*3 kernel with 64 filters followed by batch normalization and ParametricReLU. Then two sub-pixel convolution layers are applied to up-sample image to 4x.

Discriminator Network:  There is also a discriminator which will discriminate real HR image from generated SR image. It contains eight convolutional layers with an increasing number of 3 × 3 filter kernels, increasing by a factor of 2 from 64 to 512 kernels. To reduce the image resolution, strided convolutions are applied each time the number of features are doubled. The resulting 512 feature maps are followed by two dense layers and a final sigmoid activation function to obtain a probability for real or fake image.

Loss Function: In the paper, authors have defined a perceptual loss function which consists of content loss and an adversarial loss function. 

Adversarial loss tries to train generator such that it produces natural looking images which will be difficult for discriminator to distinguish from real image. In addition, they used a content loss motivated by perceptual similarity.

For content loss, mean squared error is the most widely used loss function. But it often results in perceptual unsatisfying content due to over smoothing of content. To resolve this problem authors of the paper use a loss function that is closer to perceptual similarity. They defined the VGG loss based on the ReLU activation layers of the pre-trained 19 layer VGG network.

They performed experiments on Set5, Set14 and BSD100 and tested it on BSD300 and achieved promising results. To test the results obtained by SRGAN authors have also taken mean opinion score of 26 rates. And they found results look much similar to original images.

Referenced Research Paper: Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network

GitHub: Super Resolution Examples

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.