Implementing Capsule Network in Keras

In the last blog we have seen that what is a capsule network and how it can overcome the problems associated with convolutional neural network. In this blog we will implement a capsule network in keras.

You can find full code here.

Here, we will use handwritten digit dataset(MNIST) and train the capsule network to classify the digits. MNIST digit dataset consists of grayscale images of size 28*28.

Capsule Network architecture is somewhat similar to convolutional neural network except capsule layers. We can break the implementation of capsule network into following steps:

Initial convolutional layer
Primary capsule layer
Digit capsule layer
Decoder network
Loss Functions
Training and testing of model

Initial Convolution Layer:

Initially we will use a convolution layer to detect low level features of an image. It will use 256 filters each of size 9*9 with stride 1 and activation function is relu. Input size of image is 28*28, after applying this layer output size will be 20*20*256.

input_shape = Input(shape=(28,28,1))  # size of input image is 28*28

# a convolution layer output shape = 20*20*256
conv1 = Conv2D(256, (9,9), activation = 'relu', padding = 'valid')(input_shape)

input_shape = Input(shape=(28,28,1)) # size of input image is 28*28

# a convolution layer output shape = 20*20*256

conv1 = Conv2D(256, (9,9), activation = 'relu', padding = 'valid')(input_shape)

Primary Capsule Layer:

The output from the previous layer is being passed to 256 filters each of size 9*9 with a stride of 2 which will produce an output of size 6*6*256. This output is then reshaped into 8-dimensional vector. So shape will be 6*6*32 capsules each of which will be 8-dimensional. Then it will pass through a non-linear function(squash) so that length of output vector can be maintained between 0 and 1.

# convolution layer with stride 2 and 256 filters of size 9*9
conv2 = Conv2D(256, (9,9), strides = 2, padding = 'valid')(conv1)

# reshape into 1152 capsules of 8 dimensional vectors
reshaped = Reshape((6*6*32,8))(conv2)

# squash the reshaped output to make length of vector b/w 0 and 1
squashed_output = Lambda(squash)(reshaped)

def squash(inputs):
    # take norm of input vectors
    squared_norm = K.sum(K.square(inputs), axis = -1, keepdims = True)

    # use the formula for non-linear function to return squashed output
    return ((squared_norm/(1+squared_norm))/(K.sqrt(squared_norm+K.epsilon())))*inputs

# convolution layer with stride 2 and 256 filters of size 9*9

conv2 = Conv2D(256, (9,9), strides = 2, padding = 'valid')(conv1)

# reshape into 1152 capsules of 8 dimensional vectors

reshaped = Reshape((6*6*32,8))(conv2)

# squash the reshaped output to make length of vector b/w 0 and 1

squashed_output = Lambda(squash)(reshaped)

def squash(inputs):

# take norm of input vectors

squared_norm = K.sum(K.square(inputs), axis = -1, keepdims = True)

# use the formula for non-linear function to return squashed output

return ((squared_norm/(1+squared_norm))/(K.sqrt(squared_norm+K.epsilon())))*inputs

Digit Capsule Layer:

Logic and algorithm used for this layer is explained in the previous blog. Here we will see what we need to do in code to implement it. We need to write a custom layer in keras. It will take 1152*8 as its input and produces output of size 10*16, where 10 capsules each represents an output class with 16 dimensional vector. Then each of these 10 capsules are converted into single value to predict the output class using a lambda layer.

class DigitCapsuleLayer(Layer):
    # creating a layer class in keras
    def __init__(self, **kwargs):
        super(DigitCapsuleLayer, self).__init__(**kwargs)
        self.kernel_initializer = initializers.get('glorot_uniform')
    
    def build(self, input_shape): 
        # initialize weight matrix for each capsule in lower layer
        self.W = self.add_weight(shape = [10, 6*6*32, 16, 8], initializer = self.kernel_initializer, name = 'weights')
        self.built = True
    
    def call(self, inputs):
        inputs = K.expand_dims(inputs, 1)
        inputs = K.tile(inputs, [1, 10, 1, 1])
        # matrix multiplication b/w previous layer output and weight matrix
        inputs = K.map_fn(lambda x: K.batch_dot(x, self.W, [2, 3]), elems=inputs)
        b = tf.zeros(shape = [K.shape(inputs)[0], 10, 6*6*32])
        
# routing algorithm with updating coupling coefficient c, using scalar product b/w input capsule and output capsule
        for i in range(3-1):
            c = tf.nn.softmax(b, dim=1)
            s = K.batch_dot(c, inputs, [2, 2])
            v = squash(s)
            b = b + K.batch_dot(v, inputs, [2,3])
            
        return v 
    def compute_output_shape(self, input_shape):
        return tuple([None, 10, 16])

def output_layer(inputs):
    return K.sqrt(K.sum(K.square(inputs), -1) + K.epsilon())

digit_caps = DigitCapsuleLayer()(squashed_output)
outputs = Lambda(output_layer)(digit_caps)

class DigitCapsuleLayer(Layer):

# creating a layer class in keras

def __init__(self, **kwargs):

super(DigitCapsuleLayer, self).__init__(**kwargs)

self.kernel_initializer = initializers.get('glorot_uniform')

def build(self, input_shape):

# initialize weight matrix for each capsule in lower layer

self.W = self.add_weight(shape = [10, 6*6*32, 16, 8], initializer = self.kernel_initializer, name = 'weights')

self.built = True

def call(self, inputs):

inputs = K.expand_dims(inputs, 1)

inputs = K.tile(inputs, [1, 10, 1, 1])

# matrix multiplication b/w previous layer output and weight matrix

inputs = K.map_fn(lambda x: K.batch_dot(x, self.W, [2, 3]), elems=inputs)

b = tf.zeros(shape = [K.shape(inputs)[0], 10, 6*6*32])

# routing algorithm with updating coupling coefficient c, using scalar product b/w input capsule and output capsule

for i in range(3-1):

c = tf.nn.softmax(b, dim=1)

s = K.batch_dot(c, inputs, [2, 2])

v = squash(s)

b = b + K.batch_dot(v, inputs, [2,3])

return v

def compute_output_shape(self, input_shape):

return tuple([None, 10, 16])

def output_layer(inputs):

return K.sqrt(K.sum(K.square(inputs), -1) + K.epsilon())

digit_caps = DigitCapsuleLayer()(squashed_output)

outputs = Lambda(output_layer)(digit_caps)

Decoder Network:

To further boost the pose parameters learned by the digit capsule layer, we can add decoder network to reconstruct the input image. In this part, decoder network will be fed with an input of size 10*16 (digit capsule layer output) and will reconstruct back the original image of size 28*28. Decoder will consist of 3 dense layer having 512, 1024 and 784 nodes.

During training time input to the decoder is the output from digit capsule layer which is masked with original labels. It means that other vectors except the vector corresponding to correct label will be multiplied with zero. So that decoder can only be trained with correct digit capsule. In test time input to decoder will be the same output from digit capsule layer but masked with highest length vector in that layer. Lets see the code.

def mask(outputs):

    if type(outputs) != list:  # mask at test time
        norm_outputs = K.sqrt(K.sum(K.square(outputs), -1) + K.epsilon())
        y  = K.one_hot(indices=K.argmax(norm_outputs, 1), num_classes = 10)
        y = Reshape((10,1))(y)
        return Flatten()(y*outputs)

    else:    # mask at train time
        y = Reshape((10,1))(outputs[1])
        masked_output = y*outputs[0]
        return Flatten()(masked_output)

inputs = Input(shape = (10,))
masked = Lambda(mask)([digit_caps, inputs])
masked_for_test = Lambda(mask)(digit_caps)

decoded_inputs = Input(shape = (16*10,))
dense1 = Dense(512, activation = 'relu')(decoded_inputs)
dense2 = Dense(1024, activation = 'relu')(dense1)
decoded_outputs = Dense(784, activation = 'sigmoid')(dense2)
decoded_outputs = Reshape((28,28,1))(decoded_outputs)

def mask(outputs):

if type(outputs) != list: # mask at test time

norm_outputs = K.sqrt(K.sum(K.square(outputs), -1) + K.epsilon())

y = K.one_hot(indices=K.argmax(norm_outputs, 1), num_classes = 10)

y = Reshape((10,1))(y)

return Flatten()(y*outputs)

else: # mask at train time

y = Reshape((10,1))(outputs[1])

masked_output = y*outputs[0]

return Flatten()(masked_output)

inputs = Input(shape = (10,))

masked = Lambda(mask)([digit_caps, inputs])

masked_for_test = Lambda(mask)(digit_caps)

decoded_inputs = Input(shape = (16*10,))

dense1 = Dense(512, activation = 'relu')(decoded_inputs)

dense2 = Dense(1024, activation = 'relu')(dense1)

decoded_outputs = Dense(784, activation = 'sigmoid')(dense2)

decoded_outputs = Reshape((28,28,1))(decoded_outputs)

Loss Functions:

It uses two loss function one is probabilistic loss function used for classifying digits image and another is reconstruction loss which is mean squared error. Lets see probabilistic loss which is simple to understand once you look at following code.

def loss_fn(y_true, y_pred):

    L = y_true * K.square(K.maximum(0., 0.9 - y_pred)) + 0.5 * (1 - y_true) * K.square(K.maximum(0., y_pred - 0.1))

    return K.mean(K.sum(L, 1))

def loss_fn(y_true, y_pred):

L = y_true * K.square(K.maximum(0., 0.9 - y_pred)) + 0.5 * (1 - y_true) * K.square(K.maximum(0., y_pred - 0.1))

return K.mean(K.sum(L, 1))

Training and Testing of model:

Now define our training and testing model and train it on MNIST digit dataset.

decoder = Model(decoded_inputs, decoded_outputs)
model = Model([input_shape,inputs],[outputs,decoder(masked)])
test_model = Model(input_shape,[outputs,decoder(masked_for_test)])

m = 128
epochs = 10
model.compile(optimizer=keras.optimizers.Adam(lr=0.001),loss=[loss_fn,'mse'],loss_weights = [1. ,0.0005],metrics=['accuracy'])
model.fit([x_train, y_train],[y_train,x_train], batch_size = m, epochs = epochs, validation_data = ([x_test, y_test],[y_test,x_test]))

decoder = Model(decoded_inputs, decoded_outputs)

model = Model([input_shape,inputs],[outputs,decoder(masked)])

test_model = Model(input_shape,[outputs,decoder(masked_for_test)])

m = 128

epochs = 10

model.compile(optimizer=keras.optimizers.Adam(lr=0.001),loss=[loss_fn,'mse'],loss_weights = [1. ,0.0005],metrics=['accuracy'])

model.fit([x_train, y_train],[y_train,x_train], batch_size = m, epochs = epochs, validation_data = ([x_test, y_test],[y_test,x_test]))

In test data set it was able to achieve 99.09% accuracy. Pretty good yeah! Also reconstructed images looks good. Here are the reconstructed images generated by decoder network.

Capsule Network comes with promising results and yet to be explored thoroughly. There are various bits and bytes where it can be explored. Research on a capsule network is still in an early stage but it has given clear indication that it is worth exploring.

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

0 Shares

14 thoughts on “Implementing Capsule Network in Keras”

sandeep 3 May 2019 at 5:07 pm

Hey hi, how can I predict on a new image. I mean how to send the inputs to predict function?

Reply ↓
Atul Krishna Singh 3 May 2019 at 6:19 pm

you can simply use following line:

label_predicted, image_predicted = test_model.predict(x_test)

1

label_predicted, image_predicted = test_model.predict(x_test)

where x_test are input images.

Reply ↓
satish 2 Oct 2019 at 12:09 pm

Hello,

Thanks for the nice post.

When I try to run the code, I get the following error:

ValueError: Can not do batch_dot on inputs with shapes (None, 10, 10, 1152, 16) and (None, 10, 1152, 1152, 16) with axes=[2, 3]. x.shape[2] != y.shape[3] (10 != 1152).

I cannot understand the exact problem here. Sorry, I am newbie. Could anyone please help me..

Reply ↓
satish 2 Oct 2019 at 12:15 pm

Hello,

Can anyone help me to sort this error out:

ValueError: Can not do batch_dot on inputs with shapes (None, 10, 10, 1152, 16) and (None, 10, 1152, 1152, 16) with axes=[2, 3]. x.shape[2] != y.shape[3] (10 != 1152).

I get this error when I try to run the DigitCapsuleLayer block of the code, i.e.

class DigitCapsuleLayer(Layer):
# creating a layer class in keras
def __init__(self, **kwargs):
super(DigitCapsuleLayer, self).__init__(**kwargs)
self.kernel_initializer = initializers.get(‘glorot_uniform’)

def build(self, input_shape):
# initialize weight matrix for each capsule in lower layer
self.W = self.add_weight(shape = [10, 6*6*32, 16, 8], initializer = self.kernel_initializer, name = ‘weights’)
self.built = True

def call(self, inputs):
inputs = K.expand_dims(inputs, 1)
inputs = K.tile(inputs, [1, 10, 1, 1])
# matrix multiplication b/w previous layer output and weight matrix
inputs = K.map_fn(lambda x: K.batch_dot(x, self.W, [2, 3]), elems=inputs)
b = tf.zeros(shape = [K.shape(inputs)[0], 10, 6*6*32])

# routing algorithm with updating coupling coefficient c, using scalar product b/w input capsule and output capsule
for i in range(3-1):
c = tf.nn.softmax(b, dim=1)
s = K.batch_dot(c, inputs, [2, 2])
v = squash(s)
b = b + K.batch_dot(v, inputs, [2,3])

return v
def compute_output_shape(self, input_shape):
return tuple([None, 10, 16])

Reply ↓
1. Atul Krishna Singh 2 Oct 2019 at 1:42 pm
  
  Hi Satish,
  
  See your code at this line:
  
  inputs = K.map_fn(lambda x: K.batch_dot(x, self.W, [2, 3]), elems=inputs)
  
  1
  
  inputs = K.map_fn(lambda x: K.batch_dot(x, self.W, [2, 3]), elems=inputs)
  
  In keras batch_dot() is used to compute dot product between two keras tensor or variable where both should be in batch.
  
  In the code line I mentioned above you have specified target dimension as [2,3] which means that the sizes of x.shape[2] and W.shape[3] should be equal. Which is not in your code. That’s why there is an error.
  Hope this helps.
  
  Reply ↓
  1. satish 2 Oct 2019 at 2:14 pm
    
    But I am using the code given in this web page only. I do not have my own core. Then, how come I get this error message? I remember few weeks ago when I tried this code there was no error. But today I tried the same code and I have this issue. Is there something wrong with my Keras version?
    
    Reply ↓
    1. Mehmet Ali 10 Nov 2019 at 2:22 am
      
      Configuring Keras version with following line has solved in my case:
      
      !pip install q keras==2.1.2
      
      Reply ↓
Naseer 3 Oct 2019 at 2:19 pm

What if I want to use this code only for classification of MNIST?

Reply ↓
1. kang & atul Post author3 Oct 2019 at 2:32 pm
  
  you can simply use following line:
  
  label_predicted, image_predicted = test_model.predict(x_test)
  
  1
  
  label_predicted, image_predicted = test_model.predict(x_test)
  
  where x_test are input images.
  
  Take the label_predicted for classification and you can ignore the image_predicted.
  
  Reply ↓
Naseer 3 Oct 2019 at 3:39 pm

How can I use this code for just classification part that is I have MNIST data and I just need to test the classification accuracy as I do not need reconstruction error and reconstruction part?

Reply ↓
1. kang & atul Post author3 Oct 2019 at 4:10 pm
  
  Hi Naseer,
  Firstly reconstruction part is also helping in classification part by boosting pose parameters learnt by digit capsule layer.
  Still if you want to only try classification part then you need to remove the decoder part from the model and create model accordingly. Then there is no need to use multi input and multi ouptut model, single input and output model will work.
  Hope this helps.
  
  Reply ↓
Iqra Kiran 15 Nov 2019 at 4:59 pm

what if i want to check the loss for L1 Norm?

Reply ↓
sreagm 9 Jul 2020 at 8:52 am

Can I use the same flow as described here to train a capsule network for binary classification task?

Reply ↓
Saim 25 Jul 2020 at 11:50 am

I am facing difficulty to visualize and understand in my mind that how the Batch_dot between inputs and Weights has happened. Could you please explain. I haven’t find any solution on the internet . Although there is some explanation of 2D multiplication but not about 4D

Reply ↓

TheAILearner

Mastering Artificial Intelligence

Implementing Capsule Network in Keras

14 thoughts on “Implementing Capsule Network in Keras”

Leave a Reply Cancel reply