When we see a machine learning problem related to an image, the first things comes into our mind is CNN(convolutional neural networks). Different convolutional networks like LeNet, AlexNet, VGG16, VGG19, ResNet, etc. are used to solve different problems either it is supervised(classification) or unsupervised(image generation). Through these years there has been more deeper and deeper CNN architectures are used. As more complex problem comes, more deeper convolutional networks are preferred. But with deeper networks problem of vanishing gradient arises.
To solve this problem Gao Huang et al. introduced Dense Convolutional networks. DenseNets have several compelling advantages:
- alleviate the vanishing-gradient problem
- strengthen feature propagation
- encourage feature reuse, and substantially reduce the number of parameters.
How DenseNet works?
Recent researches like ResNet also tries to solve the problem of vanishing gradient. ResNet passes information from one layer to another layer via identity connection. In ResNet features are combined through summation before passing into the next layer.
While in DenseNet, it introduces connection from one layer to all its subsequent layer in a feed forward fashion (As shown in the figure below). This connection is done using concatenation not through summation.
source: DenseNet
ResNet architecture preserve information explicitly through identity connection, also recent variation of ResNet shows that many layers contribute very little and can in fact be randomly dropped during training. DenseNet architecture explicitly differentiates between information that is added to the network and information that is preserved.
In DenseNet, Each layer has direct access to the gradients from the loss function and the original input signal, leading to an r improved flow of information and gradients throughout the network, DenseNets have a regularizing effect, which reduces overfitting on tasks with smaller training set sizes.
An important difference between DenseNet and existing network architectures is that DenseNet can have very narrow layers, e.g., k = 12. It refers to the hyperparameter k as the growth rate of the network. It means each layer in dense block will only produce k features. And these k features will be concatenated with previous layers features and will be given as input to the next layer.
DenseNet Architecture
The best way to illustrate any architecture is done with the help of code. So, I have implemented DenseNet architecture in Keras using MNIST data set.
A DenseNet consists of dense blocks. Each dense block consists of convolution layers. After a dense block a transition layer is added to proceed to next dense block (As shown in figure below).
Every layer in a dense block is directly connected to all its subsequent layers. Consequently, each layer receives the feature-maps of all preceding layer.
1 2 3 4 5 6 7 |
def dense_block(block_x, filters, growth_rate): for i in range(layers_in_block): each_layer = conv_layer(block_x, growth_rate) block_x = concatenate([block_x, each_layer], axis=-1) filters += growth_rate return block_x, filters |
Each convolution layer is consist of three consecutive operations: batch normalization (BN) , followed by a rectified linear unit (ReLU) and a 3 × 3 convolution (Conv). Also dropout can be added which depends on your architecture requirement.
1 2 3 4 5 6 7 |
def conv_layer(conv_x, filters): conv_x = BatchNormalization()(conv_x) conv_x = Activation('relu')(conv_x) conv_x = Conv2D(filters, (3, 3), kernel_initializer='he_uniform', padding='same', use_bias=False)(conv_x) conv_x = Dropout(0.2)(conv_x) return conv_x |
An essential part of convolutional networks is down-sampling layers that change the size of feature-maps. To facilitate down-sampling in DenseNet architecture it divides the network into multiple densely connected dense blocks(As shown in figure earlier).
The layers between blocks are transition layers, which do convolution and pooling. The transition layers consist of a batch normalization layer and an 1×1 convolutional layer followed by a 2×2 average pooling layer.
1 2 3 4 5 6 7 |
def transition_block(trans_x, tran_filters): trans_x = BatchNormalization()(trans_x) trans_x = Activation('relu')(trans_x) trans_x = Conv2D(tran_filters, (1, 1), kernel_initializer='he_uniform', padding='same', use_bias=False)(trans_x) trans_x = AveragePooling2D((2, 2), strides=(2, 2))(trans_x) return trans_x, tran_filters |
DenseNets can scale naturally to hundreds of layers, while exhibiting no optimization difficulties. Because of their compact internal representations and reduced feature redundancy, DenseNets may be good feature extractors for various computer vision tasks that build on convolutional features
The full code can be found here.
Referenced research paper: Densely Connected Convolutional Networks
Hope you enjoy reading. If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.