Since 2012 with the introduction of AlexNet, convolutional neural networks(CNNs) are being used as sole resource for many wide range image problems. Convolutional neural networks are able to perform really well in the field of image classification, object detection, semantic segmentation and many more.
But are CNNs best solution to solve image problems? Does they translate all features present in the image to predict the output?
Problems with Convolutional Neural Networks:
- CNNs uses pooling layers to reduce parameters so that it can speed up computation. In that process it looses some of its useful features information.
- CNNs also requires huge amount of dataset to train otherwise it will not give high accuracy in the test dataset.
- CNNs basically try to achieve “viewpoint invariance”. It means by changing input a little bit, output will not change. Also, CNNs do not store relative spatial relationship between features.
To solve these problems we need to find a better solution. That is where capsule network comes. A network which has given an early indication that it can solves problem associated with convolution neural networks. Recently, Geoffrey E. Hinton et. al. has published a paper named “Dynamic Routing Between Capsules”, in which they have introduced capsule network and dynamic routing algorithm.
What is a Capsule Network?
A capsule is a group of neurons which uses vectors to represent an object or object part. Length of a vector represents
- Capsule network tries to achieve “equivariance”. It means by changing input a little bit, output will also change but length of vector will remain same which will predict the presence of same object.
- Capsule Networks also requires less amount of data for training because it saves spatial relationship between features.
- Capsule network do not uses pooling layers which removes the problem of loosing useful features information.
How a Capsule Network works?
Usually in CNNs we deal with layers i.e. one layer passes information to subsequent layer and so on. CapsNet follows same flow as shown below.
Diagram shown above, represents network architecture used in the paper for MNIST dataset. Initial layer uses convolution to get low level features from image and pass them to a primary capsule layer.
A primary capsule layer reshapes output from previous convolution layer into capsules containing vectors of equal dimension. Length of each of these vector represents the probability of presence of an object, that is why we also need to use a non linear function “squashing” to change length of every vector between 0 and 1.
Where Sj is the input vector ||Sj|| is the norm of vector and vj is the output vector. And that will be the output of primary capsule layer. Capsules in the next layer are generated using dynamic routing algorithm. Which follows following algorithm.
Routing Algorithm:
The main feature of routing algorithm is the agreement between capsules. The lower level capsules will send values to higher level capsules if they agree to each other.
Let’s take an example of an image of a face. If there are four capsules in a lower layer each of which representing mouth, nose, left eye, and right eye respectively. And if all of these four agrees to same face position then it will send its values to the output layer capsule regarding there is a presence of a face.
To produce output for the routing capsules( capsules in the higher layer), firstly output from lower layer(u) is multiplied with weight matrix W and then it uses a coupling coefficient C. This C will determine which capsules form lower layer will send its output to which capsule in higher layer.
Coupling coefficient c is learned iteratively. The sum of all the c for a capsule ‘i’ in the lower layer is equal to 1. This maintains the probabilistic nature of vector that its length represents the probability of the presence of an object. C is determined by an applying softmax to weights b. Where initial values of b is taken to zero.
The routing agreement is determined by updating weights b by adding previous b to scalar product between current capsule in higher layer and capsule in lower layer( shown in line 7 in below algorithm)
Further to boost the capsule layer estimation, authors have added a decoder network to it. A decoder network tries to reconstruct the original image using an output of digit capsule layer. It is simply adding some fully connected layer to the
Now we have seen basic concepts of a capsule network. To get more in depth knowledge about capsule network, the best way is to implement its code. Which you can see in the next blog.
The Next Blog : Implementing Capsule Network in Keras
Referenced Research Paper: Dynamic Routing Between Capsules
Hope you enjoy reading.
If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.