Style Generative Adversarial Network (StyleGAN)

Generative adversarial network( GAN ) generates synthetic images that are indistinguishable from authentic images. A GAN network consists of a generator network and a discriminator network. Generator network tries to generate new images from a noise vector and discriminator network discriminate these generated images from the original dataset. While training the GAN model, the generator network tries to fool the discriminator and discriminator to improve itself to differentiate between real and fake images. This training will continue until the discriminator model is fooled half the time and the generator is not able to generate data similar to original data distribution.

Since the introduction of generative adversarial networks in 2014, there has been many improvements in its architecture. Deep convolutional GAN, semisupervised GAN, conditional GAN, CycleGAN and many more. These variants of GAN mainly focuses on improving the discriminator architecture and the generator model continues to operate as the black box.

The style generative adversarial network proposed an alternative generator architecture that can control the specific features of the output image such as pose, identity, hairs, freckles( when trained on face dataset ) even without compromising the image quality.

Baseline Architecture

The baseline architecture for StyleGAN is taken from another recently introduced GAN variant: Progressive GAN. In progressive GAN, both generator and discriminator grow progressively: starting from low resolution, It adds up layers to the model which can extract very fine details. In progressive GAN images start from 4×4 and generate images up to 1024×1024 size. This progressively growing architecture speeds up and stabilizes the training process which helps in generating such high-quality images.

StyleGAN Architecture

Progressive GAN was able to generate high-quality images but to control the specific features of the generated image was difficult with its architecture. To control the features of the output image some changes were made into Progressive GAN’s generator architecture and StyleGAN was created. Here is the architecture of the generator for the StyleGAN.

Along with the generator’s architecture, the above figure also differentiates between a traditional generator network and a Style-based generator network. To develop StyleGAN’s generator network, there are some modifications done in the progressive GAN. We will discuss these modifications one by one.

1. Removal of Traditional Input Layer

In traditional generator networks, a latent vector is provided through an input layer. This latent vector must follow the probability density of training data which ca leads to some degree of entanglement. Let’s say if training data consist of one type of image greater than other variations, then it can lead to producing images with features more related to that large type of data. So instead of a traditional input layer, the synthesis network( generator network) starts with a 4 × 4 × 512 constant tensor.

2. Mapping Network and AdaIN

Mapping network embeds the input latent code to intermediate latent space which can be used as style and incorporated at each block of synthesis network. As you can see in the above generator’s architecture, latent code is fed to 8 fully connected layers and an intermediate latent space W is generated.

This intermediate latent space W is passed through a convolutional layer “A” (shown in the architecture) and specializes in styles ( y = ( ys , yb )) to transform and incorporate into each block of the generator network. To incorporate this into each block of the generator network, first, the feature maps (xi) from each block are normalized separately and then scaled and biased using corresponding styles. This is also known as adaptive instance normalization (AdaIN).

This AdaIN operation is added to each block of the generator network which helps in deciding the features in the output layer.

3. Bilinear Upsampling

This generator network grows progressively. Usually upsampling in a generator network one uses transposed convolutional network. But here in StyleGAN, it uses bilinear upsampling to upsample the image instead of using the transposed convolution layer.

4. Noise Layers

As you can see in the architecture of the StyleGAN, noise layers are added after each block of the generator network( synthesis network ). This noise consists of uncorrelated Gaussian noise which is first broadcasted using a layer “B” to the shape of feature maps from each convolutional block. Using this addition of noise StyleGAN can add stochastic variations to the output.

There are many stochastic features in the human face like hairs, stubbles, freckles, or skin pores. In traditional generator, there was only a single source of noise vector to add these stochastic variations to the output which was not quite fruitful. But with adding noise at each block of synthesis network in the generator architecture make sure that it only affects stochastic aspects of the face.

5. Style Mixing

This is basically a regularization technique. During training, images are generated using two latent codes. It means two latent codes z1 and z2 are taken to produce w1 and w2 styles using a mapping network. In the Synthesis network a split point is selected and w1 style is applied up to that point and w2 style is applied after that point and the network is trained in this way.

In the synthesis network, these styles are added at each block. Due to this network can assume that these adjacent styles are correlated. But style mixing can prevent the network from assuming these adjacent styles are correlated.

Source

These were the basic changes made in baseline architecture to improve it and create a StyleGAN architecture. Other things like, generator architecture, mini-batch sizes, Adam hyperparameters and moving an exponential average of the generator are the same as baseline architecture.

Summary

StyleGAN has proven to be promising at producing high-quality realistic images also gives control to generate images with particular features. It was clearly seen that traditional generators lag far behind than this improved generator network. Concepts like mapping network and AdaIN can really be very helpful in GAN architecture and other research work.

Referenced Research Paper: 1. A Style-Based Generator Architecture for Generative Adversarial Networks 2. Progressive Growing of GANs for Improved Quality, Stability, and Variation

Hope you enjoy reading.

If you have any doubts/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Leave a Reply