Author Archives: kang & atul

PEPs

PEP stands for Python Enhancement Proposals. According to Python.org
“A PEP is a design document providing information to the Python community, or describing a new feature for Python or its processes or environment. The PEP should provide a concise technical specification of the feature and a rationale for the feature.”

Anyone can submit their own pep which then will be thoroughly peer-reviewed by the community.

PEP numbers like PEP0, PEP8 etc are assigned by the PEP editors, and once assigned are never changed. (See here for complete pep list)

According to PEP 1, there are three different types of PEPs:

Standards: Describes a new feature or implementation.
Informational: Tells us about general guidelines or information to the community but doesn’t propose a new feature.
Process: Describes a process surrounding Python like procedures, guidelines etc. Unlike informational PEPs, you are not free to ignore them.

There are few PEPs which are worth reading like

PEP 8: a style guide for python.
PEP 20: The Zen of Python (A list of 19 statements that briefly explain the philosophy behind Python).
PEP 257: Docstring Convention.

So, if you see any discrepancy write your PEP and wait for its review. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Implementing Capsule Network in Keras

14 Replies

In the last blog we have seen that what is a capsule network and how it can overcome the problems associated with convolutional neural network. In this blog we will implement a capsule network in keras.

You can find full code here.

Here, we will use handwritten digit dataset(MNIST) and train the capsule network to classify the digits. MNIST digit dataset consists of grayscale images of size 28*28.

Capsule Network architecture is somewhat similar to convolutional neural network except capsule layers. We can break the implementation of capsule network into following steps:

Initial convolutional layer
Primary capsule layer
Digit capsule layer
Decoder network
Loss Functions
Training and testing of model

Initial Convolution Layer:

Initially we will use a convolution layer to detect low level features of an image. It will use 256 filters each of size 9*9 with stride 1 and activation function is relu. Input size of image is 28*28, after applying this layer output size will be 20*20*256.

input_shape = Input(shape=(28,28,1))  # size of input image is 28*28

# a convolution layer output shape = 20*20*256
conv1 = Conv2D(256, (9,9), activation = 'relu', padding = 'valid')(input_shape)

input_shape = Input(shape=(28,28,1)) # size of input image is 28*28

# a convolution layer output shape = 20*20*256

conv1 = Conv2D(256, (9,9), activation = 'relu', padding = 'valid')(input_shape)

Primary Capsule Layer:

The output from the previous layer is being passed to 256 filters each of size 9*9 with a stride of 2 which will produce an output of size 6*6*256. This output is then reshaped into 8-dimensional vector. So shape will be 6*6*32 capsules each of which will be 8-dimensional. Then it will pass through a non-linear function(squash) so that length of output vector can be maintained between 0 and 1.

# convolution layer with stride 2 and 256 filters of size 9*9
conv2 = Conv2D(256, (9,9), strides = 2, padding = 'valid')(conv1)

# reshape into 1152 capsules of 8 dimensional vectors
reshaped = Reshape((6*6*32,8))(conv2)

# squash the reshaped output to make length of vector b/w 0 and 1
squashed_output = Lambda(squash)(reshaped)

def squash(inputs):
    # take norm of input vectors
    squared_norm = K.sum(K.square(inputs), axis = -1, keepdims = True)

    # use the formula for non-linear function to return squashed output
    return ((squared_norm/(1+squared_norm))/(K.sqrt(squared_norm+K.epsilon())))*inputs

# convolution layer with stride 2 and 256 filters of size 9*9

conv2 = Conv2D(256, (9,9), strides = 2, padding = 'valid')(conv1)

# reshape into 1152 capsules of 8 dimensional vectors

reshaped = Reshape((6*6*32,8))(conv2)

# squash the reshaped output to make length of vector b/w 0 and 1

squashed_output = Lambda(squash)(reshaped)

def squash(inputs):

# take norm of input vectors

squared_norm = K.sum(K.square(inputs), axis = -1, keepdims = True)

# use the formula for non-linear function to return squashed output

return ((squared_norm/(1+squared_norm))/(K.sqrt(squared_norm+K.epsilon())))*inputs

Digit Capsule Layer:

Logic and algorithm used for this layer is explained in the previous blog. Here we will see what we need to do in code to implement it. We need to write a custom layer in keras. It will take 1152*8 as its input and produces output of size 10*16, where 10 capsules each represents an output class with 16 dimensional vector. Then each of these 10 capsules are converted into single value to predict the output class using a lambda layer.

class DigitCapsuleLayer(Layer):
    # creating a layer class in keras
    def __init__(self, **kwargs):
        super(DigitCapsuleLayer, self).__init__(**kwargs)
        self.kernel_initializer = initializers.get('glorot_uniform')
    
    def build(self, input_shape): 
        # initialize weight matrix for each capsule in lower layer
        self.W = self.add_weight(shape = [10, 6*6*32, 16, 8], initializer = self.kernel_initializer, name = 'weights')
        self.built = True
    
    def call(self, inputs):
        inputs = K.expand_dims(inputs, 1)
        inputs = K.tile(inputs, [1, 10, 1, 1])
        # matrix multiplication b/w previous layer output and weight matrix
        inputs = K.map_fn(lambda x: K.batch_dot(x, self.W, [2, 3]), elems=inputs)
        b = tf.zeros(shape = [K.shape(inputs)[0], 10, 6*6*32])
        
# routing algorithm with updating coupling coefficient c, using scalar product b/w input capsule and output capsule
        for i in range(3-1):
            c = tf.nn.softmax(b, dim=1)
            s = K.batch_dot(c, inputs, [2, 2])
            v = squash(s)
            b = b + K.batch_dot(v, inputs, [2,3])
            
        return v 
    def compute_output_shape(self, input_shape):
        return tuple([None, 10, 16])

def output_layer(inputs):
    return K.sqrt(K.sum(K.square(inputs), -1) + K.epsilon())

digit_caps = DigitCapsuleLayer()(squashed_output)
outputs = Lambda(output_layer)(digit_caps)

class DigitCapsuleLayer(Layer):

# creating a layer class in keras

def __init__(self, **kwargs):

super(DigitCapsuleLayer, self).__init__(**kwargs)

self.kernel_initializer = initializers.get('glorot_uniform')

def build(self, input_shape):

# initialize weight matrix for each capsule in lower layer

self.W = self.add_weight(shape = [10, 6*6*32, 16, 8], initializer = self.kernel_initializer, name = 'weights')

self.built = True

def call(self, inputs):

inputs = K.expand_dims(inputs, 1)

inputs = K.tile(inputs, [1, 10, 1, 1])

# matrix multiplication b/w previous layer output and weight matrix

inputs = K.map_fn(lambda x: K.batch_dot(x, self.W, [2, 3]), elems=inputs)

b = tf.zeros(shape = [K.shape(inputs)[0], 10, 6*6*32])

# routing algorithm with updating coupling coefficient c, using scalar product b/w input capsule and output capsule

for i in range(3-1):

c = tf.nn.softmax(b, dim=1)

s = K.batch_dot(c, inputs, [2, 2])

v = squash(s)

b = b + K.batch_dot(v, inputs, [2,3])

return v

def compute_output_shape(self, input_shape):

return tuple([None, 10, 16])

def output_layer(inputs):

return K.sqrt(K.sum(K.square(inputs), -1) + K.epsilon())

digit_caps = DigitCapsuleLayer()(squashed_output)

outputs = Lambda(output_layer)(digit_caps)

Decoder Network:

To further boost the pose parameters learned by the digit capsule layer, we can add decoder network to reconstruct the input image. In this part, decoder network will be fed with an input of size 10*16 (digit capsule layer output) and will reconstruct back the original image of size 28*28. Decoder will consist of 3 dense layer having 512, 1024 and 784 nodes.

During training time input to the decoder is the output from digit capsule layer which is masked with original labels. It means that other vectors except the vector corresponding to correct label will be multiplied with zero. So that decoder can only be trained with correct digit capsule. In test time input to decoder will be the same output from digit capsule layer but masked with highest length vector in that layer. Lets see the code.

def mask(outputs):

    if type(outputs) != list:  # mask at test time
        norm_outputs = K.sqrt(K.sum(K.square(outputs), -1) + K.epsilon())
        y  = K.one_hot(indices=K.argmax(norm_outputs, 1), num_classes = 10)
        y = Reshape((10,1))(y)
        return Flatten()(y*outputs)

    else:    # mask at train time
        y = Reshape((10,1))(outputs[1])
        masked_output = y*outputs[0]
        return Flatten()(masked_output)

inputs = Input(shape = (10,))
masked = Lambda(mask)([digit_caps, inputs])
masked_for_test = Lambda(mask)(digit_caps)

decoded_inputs = Input(shape = (16*10,))
dense1 = Dense(512, activation = 'relu')(decoded_inputs)
dense2 = Dense(1024, activation = 'relu')(dense1)
decoded_outputs = Dense(784, activation = 'sigmoid')(dense2)
decoded_outputs = Reshape((28,28,1))(decoded_outputs)

def mask(outputs):

if type(outputs) != list: # mask at test time

norm_outputs = K.sqrt(K.sum(K.square(outputs), -1) + K.epsilon())

y = K.one_hot(indices=K.argmax(norm_outputs, 1), num_classes = 10)

y = Reshape((10,1))(y)

return Flatten()(y*outputs)

else: # mask at train time

y = Reshape((10,1))(outputs[1])

masked_output = y*outputs[0]

return Flatten()(masked_output)

inputs = Input(shape = (10,))

masked = Lambda(mask)([digit_caps, inputs])

masked_for_test = Lambda(mask)(digit_caps)

decoded_inputs = Input(shape = (16*10,))

dense1 = Dense(512, activation = 'relu')(decoded_inputs)

dense2 = Dense(1024, activation = 'relu')(dense1)

decoded_outputs = Dense(784, activation = 'sigmoid')(dense2)

decoded_outputs = Reshape((28,28,1))(decoded_outputs)

Loss Functions:

It uses two loss function one is probabilistic loss function used for classifying digits image and another is reconstruction loss which is mean squared error. Lets see probabilistic loss which is simple to understand once you look at following code.

def loss_fn(y_true, y_pred):

    L = y_true * K.square(K.maximum(0., 0.9 - y_pred)) + 0.5 * (1 - y_true) * K.square(K.maximum(0., y_pred - 0.1))

    return K.mean(K.sum(L, 1))

def loss_fn(y_true, y_pred):

L = y_true * K.square(K.maximum(0., 0.9 - y_pred)) + 0.5 * (1 - y_true) * K.square(K.maximum(0., y_pred - 0.1))

return K.mean(K.sum(L, 1))

Training and Testing of model:

Now define our training and testing model and train it on MNIST digit dataset.

decoder = Model(decoded_inputs, decoded_outputs)
model = Model([input_shape,inputs],[outputs,decoder(masked)])
test_model = Model(input_shape,[outputs,decoder(masked_for_test)])

m = 128
epochs = 10
model.compile(optimizer=keras.optimizers.Adam(lr=0.001),loss=[loss_fn,'mse'],loss_weights = [1. ,0.0005],metrics=['accuracy'])
model.fit([x_train, y_train],[y_train,x_train], batch_size = m, epochs = epochs, validation_data = ([x_test, y_test],[y_test,x_test]))

decoder = Model(decoded_inputs, decoded_outputs)

model = Model([input_shape,inputs],[outputs,decoder(masked)])

test_model = Model(input_shape,[outputs,decoder(masked_for_test)])

m = 128

epochs = 10

model.compile(optimizer=keras.optimizers.Adam(lr=0.001),loss=[loss_fn,'mse'],loss_weights = [1. ,0.0005],metrics=['accuracy'])

model.fit([x_train, y_train],[y_train,x_train], batch_size = m, epochs = epochs, validation_data = ([x_test, y_test],[y_test,x_test]))

In test data set it was able to achieve 99.09% accuracy. Pretty good yeah! Also reconstructed images looks good. Here are the reconstructed images generated by decoder network.

Capsule Network comes with promising results and yet to be explored thoroughly. There are various bits and bytes where it can be explored. Research on a capsule network is still in an early stage but it has given clear indication that it is worth exploring.

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Capsule Networks

Leave a reply

Since 2012 with the introduction of AlexNet, convolutional neural networks(CNNs) are being used as sole resource for many wide range image problems. Convolutional neural networks are able to perform really well in the field of image classification, object detection, semantic segmentation and many more.

But are CNNs best solution to solve image problems? Does they translate all features present in the image to predict the output?

Problems with Convolutional Neural Networks:

CNNs uses pooling layers to reduce parameters so that it can speed up computation. In that process it looses some of its useful features information.
CNNs also requires huge amount of dataset to train otherwise it will not give high accuracy in the test dataset.
CNNs basically try to achieve “viewpoint invariance”. It means by changing input a little bit, output will not change. Also, CNNs do not store relative spatial relationship between features.

To solve these problems we need to find a better solution. That is where capsule network comes. A network which has given an early indication that it can solves problem associated with convolution neural networks. Recently, Geoffrey E. Hinton et. al. has published a paper named “Dynamic Routing Between Capsules”, in which they have introduced capsule network and dynamic routing algorithm.

What is a Capsule Network?

A capsule is a group of neurons which uses vectors to represent an object or object part. Length of a vector represents presence of an object and orientation of vector represents its pose(size, position, orientation, etc). Group of these capsules forms a capsule layer and then these layers lead to form a capsule network. It has some advantages over CNN.

Capsule network tries to achieve “equivariance”. It means by changing input a little bit, output will also change but length of vector will remain same which will predict the presence of same object.
Capsule Networks also requires less amount of data for training because it saves spatial relationship between features.
Capsule network do not uses pooling layers which removes the problem of loosing useful features information.

How a Capsule Network works?

Usually in CNNs we deal with layers i.e. one layer passes information to subsequent layer and so on. CapsNet follows same flow as shown below.

Diagram shown above, represents network architecture used in the paper for MNIST dataset. Initial layer uses convolution to get low level features from image and pass them to a primary capsule layer.

A primary capsule layer reshapes output from previous convolution layer into capsules containing vectors of equal dimension. Length of each of these vector represents the probability of presence of an object, that is why we also need to use a non linear function “squashing” to change length of every vector between 0 and 1.

Where S_j is the input vector ||S_j|| is the norm of vector and v_j is the output vector. And that will be the output of primary capsule layer. Capsules in the next layer are generated using dynamic routing algorithm. Which follows following algorithm.

Routing Algorithm:

The main feature of routing algorithm is the agreement between capsules. The lower level capsules will send values to higher level capsules if they agree to each other.

Let’s take an example of an image of a face. If there are four capsules in a lower layer each of which representing mouth, nose, left eye, and right eye respectively. And if all of these four agrees to same face position then it will send its values to the output layer capsule regarding there is a presence of a face.

To produce output for the routing capsules( capsules in the higher layer), firstly output from lower layer(u) is multiplied with weight matrix W and then it uses a coupling coefficient C. This C will determine which capsules form lower layer will send its output to which capsule in higher layer.

Coupling coefficient c is learned iteratively. The sum of all the c for a capsule ‘i’ in the lower layer is equal to 1. This maintains the probabilistic nature of vector that its length represents the probability of the presence of an object. C is determined by an applying softmax to weights b. Where initial values of b is taken to zero.

The routing agreement is determined by updating weights b by adding previous b to scalar product between current capsule in higher layer and capsule in lower layer( shown in line 7 in below algorithm)

Further to boost the capsule layer estimation, authors have added a decoder network to it. A decoder network tries to reconstruct the original image using an output of digit capsule layer. It is simply adding some fully connected layer to the output of 16-dimensional capsule layer.

Now we have seen basic concepts of a capsule network. To get more in depth knowledge about capsule network, the best way is to implement its code. Which you can see in the next blog.

The Next Blog : Implementing Capsule Network in Keras

Referenced Research Paper: Dynamic Routing Between Capsules

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Feeding output of a given intermediate layer in Keras as the input to another network

Leave a reply

Keras is a high level neural network library used for fast experimentation, user friendliness and easy extensibility. It is highly recommended library for a beginner in neural networks. In this blog we will learn how to use an intermediate layer of a neural network as input to another network.

Sometimes you might get stuck while using an output of an intermediate layer with the errors like ‘graph disconnected‘. Lets see how we can solve this through the code.

First, Lets create an autoencoder model. If you are not aware of what is an autoencoder, you can follow this blog.

# creating autoencoder model
encoder_inputs = Input(shape = (28,28,1))

conv1 = Conv2D(16, (3,3), activation = 'relu', padding = "SAME")(encoder_inputs)
pool1 = MaxPooling2D(pool_size = (2,2), strides = 2)(conv1)
conv2 = Conv2D(32, (3,3), activation = 'relu', padding = "SAME")(pool1)
pool2 = MaxPooling2D(pool_size = (2,2), strides = 2)(conv2)
flat = Flatten()(pool2)
encoder_outputs = Dense(32, activation = 'relu')(flat)

dense_layer_d = Dense(7*7*32, activation = 'relu')(encoder_outputs)
output_from_d = Reshape((7,7,32))(dense_layer_d)
conv1_1 = Conv2D(32, (3,3), activation = 'relu', padding = "SAME")(output_from_d)
upsampling_1 = Conv2DTranspose(32, 3, padding='same', activation='relu', strides=(2, 2))(conv1_1)
upsampling_2 = Conv2DTranspose(16, 3, padding='same', activation='relu', strides=(2, 2))(upsampling_1)
decoded_outputs = Conv2DTranspose(1, 3, padding='same', activation='relu')(upsampling_2)

autoencoder = Model(encoder_inputs, decoded_outputs)

# creating autoencoder model

encoder_inputs = Input(shape = (28,28,1))

conv1 = Conv2D(16, (3,3), activation = 'relu', padding = "SAME")(encoder_inputs)

pool1 = MaxPooling2D(pool_size = (2,2), strides = 2)(conv1)

conv2 = Conv2D(32, (3,3), activation = 'relu', padding = "SAME")(pool1)

pool2 = MaxPooling2D(pool_size = (2,2), strides = 2)(conv2)

flat = Flatten()(pool2)

encoder_outputs = Dense(32, activation = 'relu')(flat)

dense_layer_d = Dense(7*7*32, activation = 'relu')(encoder_outputs)

output_from_d = Reshape((7,7,32))(dense_layer_d)

conv1_1 = Conv2D(32, (3,3), activation = 'relu', padding = "SAME")(output_from_d)

upsampling_1 = Conv2DTranspose(32, 3, padding='same', activation='relu', strides=(2, 2))(conv1_1)

upsampling_2 = Conv2DTranspose(16, 3, padding='same', activation='relu', strides=(2, 2))(upsampling_1)

decoded_outputs = Conv2DTranspose(1, 3, padding='same', activation='relu')(upsampling_2)

autoencoder = Model(encoder_inputs, decoded_outputs)

In the above code we have created an autoencoder model. At line 9, we have generated encoder outputs. Now if you want to create decoder network from this model with encoder_outputs layer as it input, what should you do? A beginner will do something like this:

decoder_network = Model(dense_layer_d, decoded_outputs)

1	decoder_network = Model(dense_layer_d, decoded_outputs)

But this will throw an error ‘graph disconnected’. This is because dense_layer_d layer is connected to another previous layer and you have disconnected it to directly take this layer as input. To solve this problem you can do something like this:

decoder_input = Input(shape = (32,))
next_layer = decoder_input
for layer in autoencoder.layers[-6:]:
    next_layer = layer(next_layer)

decoder = Model(decoder_input, next_layer)

decoder_input = Input(shape = (32,))

next_layer = decoder_input

for layer in autoencoder.layers[-6:]:

next_layer = layer(next_layer)

decoder = Model(decoder_input, next_layer)

Earlier we have created a model autoencoder. Now if you want to get its intermediate layer, use following steps:

Find index of the input layer to decoder( in the given autoencoder model it is the 6th layer from last so -6)
Use autoencoder.layers to get that layer.
Iterate through the following layers in the autoencoder model, till the decoder_output layer.
Then create model using decoder_input and last iterated layer.

This will successfully create a decoder model which will take the output of an intermediate layer ‘encoder_outputs’ as its input. And that’s it!!

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Custom Layers in Keras

Leave a reply

A model in Keras is composed of layers. There are in-built layers present in Keras which you can directly import like Conv2D, Pool, Flatten, Reshape, etc. But sometimes you need to add your own custom layer. In this blog, we will learn how to add a custom layer in Keras.

There are basically two types of custom layers that you can add in Keras.

Lambda Layer

Lambda layer is useful whenever you need to do some operation on previous layer and do not want to add any trainable weights to it.

Let say you want to add your own activation function (which is not built-in Keras) to a layer. Then you first need to define a function which will take the output from the previous layer as input and apply custom activation function to it. We then pass this function to lambda layer.

from keras.layers import Lambda
from keras import backend as K

# defining a custom non linear function
def activation_relu(inputs):
    return K.maximum(0.,inputs)

# call function using lambda layer
squashed_output = Lambda(activation_relu)(inputs) # where inputs are output from previous layer

from keras.layers import Lambda

from keras import backend as K

# defining a custom non linear function

def activation_relu(inputs):

return K.maximum(0.,inputs)

# call function using lambda layer

squashed_output = Lambda(activation_relu)(inputs) # where inputs are output from previous layer

Custom Class Layer

Sometimes you want to create your own layer with trainable weights which is not in-built in Keras. In that case you need to create a custom class layer where you need to define following methods.

__init__ method to initialize class variable and super class variables
build method to define weights.
call method where you will perform all your operations.
compute_output_shape method to define output shape of this custom layer

Lets see an example of a custom layer class. Here you only need to focus on the architecture of the class.


     class MyLayer(Layer):

    def __init__(self, output_dim, **kwargs):
        self.output_dim = output_dim
        super(MyLayer, self).__init__(**kwargs)

    def build(self, input_shape):
        # Create a trainable weight variable for this layer.
        self.W= self.add_weight(name='kernel', 
                                      shape=(input_shape[1], self.output_dim),
                                      initializer='uniform',
                                      trainable=True)
        self.built = True 

    def call(self, x):
        return K.dot(x, self.W)

    def compute_output_shape(self, input_shape):
        return (input_shape[0], self.output_dim)

class MyLayer(Layer):

def __init__(self, output_dim, **kwargs):

self.output_dim = output_dim

super(MyLayer, self).__init__(**kwargs)

def build(self, input_shape):

# Create a trainable weight variable for this layer.

self.W= self.add_weight(name='kernel',

shape=(input_shape[1], self.output_dim),

initializer='uniform',

trainable=True)

self.built = True

def call(self, x):

return K.dot(x, self.W)

def compute_output_shape(self, input_shape):

return (input_shape[0], self.output_dim)

In the build method defining self.built = True is necessary. Also, you can see that all logic is written inside call(self, inputs) method. comput_output_shape will define the output shape of the layer.

You can also pass multiple input tensor to this custom layer. The only thing you need to do is, pass multiple inputs using a list.

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Saving and Loading models in Keras

1 Reply

Generally, a deep learning model takes a large amount of time to train, so its better to know how to save trained model. In this blog we will learn about how to save whole keras model i.e. its architecture, weights and optimizer state.

Lets first create a model in Keras. This is a simple autoencoder model. If you need to know more about autoencoders please refer this blog.

from keras.models import Model
from keras.layers import Dense, Input
import matplotlib.pyplot as plt
import numpy as np

# creating an autoencoder model
input_shape = Input(shape = (784,))
dense1 = Dense(512, activation = 'relu')(input_shape)
dense2 = Dense(256, activation = 'relu')(dense1)
encoded = Dense(32, activation = 'relu')(dense2)

dense3 = Dense(256, activation = 'relu')(encoded)
dense4 = Dense(512, activation = 'relu')(dense3)
decoded = Dense(784, activation = 'relu')(dense4)

autoencoder = Model(input_shape, decoded)

m = 256
n_epoch = 25
autoencoder.compile(optimizer='adam', loss='mse')
autoencoder.fit(output_X_train,output_X_train, epochs=n_epoch, batch_size=m, shuffle=True)

from keras.models import Model

from keras.layers import Dense, Input

import matplotlib.pyplot as plt

import numpy as np

# creating an autoencoder model

input_shape = Input(shape = (784,))

dense1 = Dense(512, activation = 'relu')(input_shape)

dense2 = Dense(256, activation = 'relu')(dense1)

encoded = Dense(32, activation = 'relu')(dense2)

dense3 = Dense(256, activation = 'relu')(encoded)

dense4 = Dense(512, activation = 'relu')(dense3)

decoded = Dense(784, activation = 'relu')(dense4)

autoencoder = Model(input_shape, decoded)

m = 256

n_epoch = 25

autoencoder.compile(optimizer='adam', loss='mse')

autoencoder.fit(output_X_train,output_X_train, epochs=n_epoch, batch_size=m, shuffle=True)

Above we have created a Keras model named as “autoencoder“. Now lets see how to save this model.

Saving and loading only architecture of a model

In keras, you can save and load architecture of a model in two formats: JSON or YAML Models generated in these two format are human readable and can be edited if needed.

# saving and loading model architecture in json format

# saving in json format
json_model = autoencoder.to_json()
json_file = open('autoencoder_json.json', 'w')
json_file.write(json_model)

# loading model architecture from json file
from keras.models import model_from_json
json_file = open('autoencoder_json.json', 'r')
json_model = model_from_json(json_file.read())

# saving and loading model architecture in json format

# saving in json format

json_model = autoencoder.to_json()

json_file = open('autoencoder_json.json', 'w')

json_file.write(json_model)

# loading model architecture from json file

from keras.models import model_from_json

json_file = open('autoencoder_json.json', 'r')

json_model = model_from_json(json_file.read())

# saving and loading model architecture in yaml format

# saving in yaml format
yaml_model = autoencoder.to_yaml()
yaml_file = open('autoencoder_yaml.yaml', 'w')
yaml_file.write(yaml_model)

# loading model architecture from yaml file
from keras.models import model_from_yaml
yaml_file = open('autoencoder_yaml.yaml', 'r')
yaml_model = model_from_yaml(yaml_file.read())

# saving and loading model architecture in yaml format

# saving in yaml format

yaml_model = autoencoder.to_yaml()

yaml_file = open('autoencoder_yaml.yaml', 'w')

yaml_file.write(yaml_model)

# loading model architecture from yaml file

from keras.models import model_from_yaml

yaml_file = open('autoencoder_yaml.yaml', 'r')

yaml_model = model_from_yaml(yaml_file.read())

Saving and Loading Weights of a Keras Model

With model architecture you will also need model weights to predict output from trained model.

# saving model weights
autoencoder.save_weights('autoencoder_weights.h5')

# loading weights of a keras model
json_model.load_weights('autoencoder_weights.h5')

# saving model weights

autoencoder.save_weights('autoencoder_weights.h5')

# loading weights of a keras model

json_model.load_weights('autoencoder_weights.h5')

Saving and Loading Both Architecture and Weights in one File

# saving whole model
autoencoder.save('autoencoder_model.h5')

# loading whole model
from keras.models import load_model
model1 = load_model('autoencoder_model.h5')

# saving whole model

autoencoder.save('autoencoder_model.h5')

# loading whole model

from keras.models import load_model

model1 = load_model('autoencoder_model.h5')

This will save following four parameters in “autoencoder_model.h5” file:

Model Architecture
Model Weights
Loss and Optimizer
State of the optimizer allowing to resume training where you left.

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Log Transformation

Leave a reply

Log transformation means replacing each pixel value with its logarithm. The general form of log transformation function is

s = T(r) = c*log(1+r)

Where, ‘s’ and ‘r’ are the output and input pixel values and c is the scaling constant represented by the following expression (for 8-bit)

c = 255/(log(1 + max_input_pixel_value))

The value of c is chosen such that we get the maximum output value corresponding to the bit size used. e.g for 8 bit image, c is chosen such that we get max value equal to 255.

For an 8-bit image, log transformation looks like this

Clearly, the low intensity values in the input image are mapped to a wider range of output levels. The opposite is true for the higher values.

Applications:

Expands the dark pixels in the image while compressing the brighter pixels
Compresses the dynamic range (display of Fourier transform).

Dynamic range refers to the ratio of max and min intensity values. When the dynamic range of the image is greater than that of displaying device(like in Fourier transform), the lower values are suppressed. To overcome this issue, we use log transform. Log transformation first compresses the dynamic range and then upscales the image to a dynamic range of the display device. In this way, lower values are enhanced and thus the image shows significantly more details.

The code below shows how to apply log transform using OpenCV Python

import cv2
import numpy as np
# Load the image
img = cv2.imread('D:/downloads/pasta.JPG')
# Apply log transform
img_log = (np.log(img+1)/(np.log(1+np.max(img))))*255
# Specify the data type
img_log = np.array(img_log,dtype=np.uint8)
# Display the image
cv2.imshow('log_image',img_log )
cv2.imshow('original_img',img)
cv2.waitKey(0)

import cv2

import numpy as np

# Load the image

img = cv2.imread('D:/downloads/pasta.JPG')

# Apply log transform

img_log = (np.log(img+1)/(np.log(1+np.max(img))))*255

# Specify the data type

img_log = np.array(img_log,dtype=np.uint8)

# Display the image

cv2.imshow('log_image',img_log )

cv2.imshow('original_img',img)

cv2.waitKey(0)

Thus, a logarithmic transform is appropriate when we want to enhance the low pixel values at the expense of loss of information in the high pixel values.

Be careful, if most of the details are present in the high pixel values, then applying the log transform results in the loss of information as shown below

In the next blog, we will discuss Power law or Gamma transformation. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Image Negatives or inverting images using OpenCV

Leave a reply

Image negatives, most of you might have heard this term, in good old days were used to produce images. Film Photography has not yet become obsolete as some wedding photographers are still shooting film. Because one has to pay for the film rolls and processing fees, most people have now switched to digital.

I recently heard of Foveon X3 direct image sensor which claims to combine the power of digital sensor with the essence of the film. (Check here)

Image negative is produced by subtracting each pixel from the maximum intensity value. e.g. for an 8-bit image, the max intensity value is 2⁸– 1 = 255, thus each pixel is subtracted from 255 to produce the output image.

Thus, the transformation function used in image negative is

s = T(r) = L – 1 – r

Where L-1 is the max intensity value and s, and r are the output and input pixel values respectively.

For grayscale images, light areas appear dark and vice versa. For color images, colors are replaced by their complementary colors. Thus, red areas appear cyan, greens appear magenta, and blues appear yellow, and vice versa.

import cv2
import numpy as np
# Load the image
img = cv2.imread('D:/downloads/forest.jpg')
# Check the datatype of the image
print(img.dtype)
# Subtract the img from max value(calculated from dtype)
img_neg = 255 - img
# Show the image
cv2.imshow('negative',img_neg)
cv2.waitKey(0)

import cv2

import numpy as np

# Load the image

img = cv2.imread('D:/downloads/forest.jpg')

# Check the datatype of the image

print(img.dtype)

# Subtract the img from max value(calculated from dtype)

img_neg = 255 - img

# Show the image

cv2.imshow('negative',img_neg)

cv2.waitKey(0)

The output looks like this

Method 2

OpenCV provides a built-in function cv2.bitwise_not() that inverts every bit of an array. This takes as input the original image and outputs the inverted image. Below is the code for this.

import cv2

# Load the image
img = cv2.imread('D:/downloads/forest.jpg')

# Invert the image using cv2.bitwise_not
img_neg = cv2.bitwise_not(img)

# Show the image
cv2.imshow('negative',img_neg)
cv2.waitKey(0)

import cv2

# Load the image

img = cv2.imread('D:/downloads/forest.jpg')

# Invert the image using cv2.bitwise_not

img_neg = cv2.bitwise_not(img)

# Show the image

cv2.imshow('negative',img_neg)

cv2.waitKey(0)

There is a long debate going on whether black on white or white on black is better. To my knowledge, Image negative favors black on white thus it is suited for enhancing the white or gray information embedded in the dark regions of the image especially when the black areas are dominant in size.

Application: In grayscale images, when the background is black, the foreground gray levels are not clearly visible. So, converting background to white, the gray levels now become more visible.

In the next blog, we will discuss Log transformations in detail. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Compression of data using Autoencoders

Leave a reply

In the last blog, we discussed what autoencoders are. In this blog, we will learn, how autoencoders can be used to compress data and reconstruct back the original data.

Here I have used MNIST dataset. First, I have downloaded MNIST dataset which is having digits images(0 to 9), a total of size 45 MB. Let’s, see the code to download data using python.

# download training and test data from mnist and reshape it
from keras.datasets import mnist
(X_train, _), (_, _) = mnist.load_data()
X_train = X_train.astype('float32') / 255.
output_X_train = X_train.reshape(-1,28,28,1)

# download training and test data from mnist and reshape it

from keras.datasets import mnist

(X_train, _), (_, _) = mnist.load_data()

X_train = X_train.astype('float32') / 255.

output_X_train = X_train.reshape(-1,28,28,1)

Since we want to compress the dataset and reconstruct back it into original data, first we have to create a convolutional autoencoder. Let’s see code:

# creating autoencoder model
encoder_inputs = Input(shape = (28,28,1))

conv1 = Conv2D(16, (3,3), activation = 'relu', padding = "SAME")(encoder_inputs)
pool1 = MaxPooling2D(pool_size = (2,2), strides = 2)(conv1)
conv2 = Conv2D(32, (3,3), activation = 'relu', padding = "SAME")(pool1)
pool2 = MaxPooling2D(pool_size = (2,2), strides = 2)(conv2)
flat = Flatten()(pool2)

enocoder_outputs = Dense(32, activation = 'relu')(flat)
#upsampling in decoder

dense_layer_d = Dense(7*7*32, activation = 'relu')(enocoder_outputs)
output_from_d = Reshape((7,7,32))(dense_layer_d)
conv1_1 = Conv2D(32, (3,3), activation = 'relu', padding = "SAME")(output_from_d)
upsampling_1 = Conv2DTranspose(32, 3, padding='same', activation='relu', strides=(2, 2))(conv1_1)
upsampling_2 = Conv2DTranspose(16, 3, padding='same', activation='relu', strides=(2, 2))(upsampling_1)
decoded_outputs = Conv2DTranspose(1, 3, padding='same', activation='relu')(upsampling_2)

autoencoder = Model(encoder_inputs, decoded_outputs)

# creating autoencoder model

encoder_inputs = Input(shape = (28,28,1))

conv1 = Conv2D(16, (3,3), activation = 'relu', padding = "SAME")(encoder_inputs)

pool1 = MaxPooling2D(pool_size = (2,2), strides = 2)(conv1)

conv2 = Conv2D(32, (3,3), activation = 'relu', padding = "SAME")(pool1)

pool2 = MaxPooling2D(pool_size = (2,2), strides = 2)(conv2)

flat = Flatten()(pool2)

enocoder_outputs = Dense(32, activation = 'relu')(flat)

#upsampling in decoder

dense_layer_d = Dense(7*7*32, activation = 'relu')(enocoder_outputs)

output_from_d = Reshape((7,7,32))(dense_layer_d)

conv1_1 = Conv2D(32, (3,3), activation = 'relu', padding = "SAME")(output_from_d)

upsampling_1 = Conv2DTranspose(32, 3, padding='same', activation='relu', strides=(2, 2))(conv1_1)

upsampling_2 = Conv2DTranspose(16, 3, padding='same', activation='relu', strides=(2, 2))(upsampling_1)

decoded_outputs = Conv2DTranspose(1, 3, padding='same', activation='relu')(upsampling_2)

autoencoder = Model(encoder_inputs, decoded_outputs)

From this autoencoder model, I have created encoder and decoder model. Encoder model will compress the data and decoder model will be used while reconstructing original data. Then trained the auotoencoder model.

decoder_input = Input(shape = (32,))
next_layer = decoder_input
for layer in autoencoder.layers[-6:]:  # to get input layer for decoder
    next_layer = layer(next_layer)

decoder = Model(decoder_input, next_layer)

encoder = Model(encoder_inputs, enocoder_outputs)

m = 256 # batch size
n_epoch = 100
autoencoder.compile(optimizer='adam', loss='mse')
autoencoder.fit(output_X_train,output_X_train, epochs=n_epoch, batch_size=m, shuffle=True)

decoder_input = Input(shape = (32,))

next_layer = decoder_input

for layer in autoencoder.layers[-6:]: # to get input layer for decoder

next_layer = layer(next_layer)

decoder = Model(decoder_input, next_layer)

encoder = Model(encoder_inputs, enocoder_outputs)

m = 256 # batch size

n_epoch = 100

autoencoder.compile(optimizer='adam', loss='mse')

autoencoder.fit(output_X_train,output_X_train, epochs=n_epoch, batch_size=m, shuffle=True)

Using encoder model we can save compressed data into a text file. Which having size of 18 MB( Much less then original size 45 MB).

encoded = encoder.predict(output_X_train)
with open('compressed_data.txt', 'w') as data_file:
    for data in encoded:
        for each_data in data:
            data_file.write(str(each_data))
            data_file.write('\n')

encoded = encoder.predict(output_X_train)

with open('compressed_data.txt', 'w') as data_file:

for data in encoded:

for each_data in data:

data_file.write(str(each_data))

data_file.write('\n')

Now next thing is how we can reconstruct this compressed data when original data is needed. The simple solution is, we can save our decoder model and its weight which will be used further to reconstruct this compressed data. Let’s save decoder model and it’s weights.

decoder.save_weights('decoder.h5')
decoder_json = decoder.to_json()
with open('decoder.json', 'w') as json_file:
    json_file.write(decoder_json)

decoder.save_weights('decoder.h5')

decoder_json = decoder.to_json()

with open('decoder.json', 'w') as json_file:

json_file.write(decoder_json)

Finally we are having our compressed data and decoder model. Let’s see code how we can simply reconstruct back using these two.

# reading compressed data
with open('compressed_data.txt') as data_file:
    data = data_file.readlines()

compressed_data = [float(x.strip()) for x in data]
compressed_data= [compressed_data[i:i+32] for i in range(0, len(compressed_data), 32)] 

# load decoder model and its weights
json_file = open('decoder.json', 'r')
loaded_json_model = json_file.read()
decoder = model_from_json(loaded_json_model)
decoder.load_weights('decoder.h5')

decoded_imgs  = decoder.predict(np.array(compressed_data))

# reading compressed data

with open('compressed_data.txt') as data_file:

data = data_file.readlines()

compressed_data = [float(x.strip()) for x in data]

compressed_data= [compressed_data[i:i+32] for i in range(0, len(compressed_data), 32)]

# load decoder model and its weights

json_file = open('decoder.json', 'r')

loaded_json_model = json_file.read()

decoder = model_from_json(loaded_json_model)

decoder.load_weights('decoder.h5')

decoded_imgs = decoder.predict(np.array(compressed_data))

Above are our output from decoder model.

It looks fascinating to compress data to less size and get same data back when we need, but there are some real problem with this method.

The problem is autoencoders can not generalize. Autoencoders can only reconstruct images for which these are trained. But with the advancement in deep learning those days are not far away when you will use this type compression using deep learning.

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Sparse Autoencoders

1 Reply

In the last blog we have seen autoencoders and its applications. In this blog we will learn one of its variant, sparse autoencoders.

In every autoencoder, we try to learn compressed representation of the input. Let’s take an example of a simple autoencoder having input vector dimension of 1000, compressed into 500 hidden units and reconstructed back into 1000 outputs. The hidden units will learn correlated features present in the input. But what if input features are completely random? Then it will we difficult for hidden units to learn interesting structure present in data. In that situation what we can do is increase the number of hidden units and add some sparsity constraints. Now the question is what are sparsity constraints?

When sparsity constraints added to a hidden unit, it only activates some units (having large activation values) and makes rest to zero. So, even if we are having a large number of hidden units( as in the above example), it will only fire some hidden units and learn useful structure present in the data.

The simplest implementation of sparsity constraints can be done in keras. You can simple add activity_regularizer to a layer (see line 11) and it will do the rest.

from keras import regularizers
# creating autoencoder model
encoder_inputs = Input(shape = (28,28,1))

conv1 = Conv2D(16, (3,3), activation = 'relu', padding = "SAME")(encoder_inputs)
pool1 = MaxPooling2D(pool_size = (2,2), strides = 2)(conv1)
conv2 = Conv2D(32, (3,3), activation = 'relu', padding = "SAME")(pool1)
pool2 = MaxPooling2D(pool_size = (2,2), strides = 2)(conv2)
flat = Flatten()(pool2)

enocder_outputs = Dense(32, activation = 'relu', activity_regularizer=regularizers.l1(10e-5))(flat)

#upsampling in decoder

dense_layer_d = Dense(7*7*32, activation = 'relu')(enocder_outputs)
output_from_d = Reshape((7,7,32))(dense_layer_d)
conv1_1 = Conv2D(32, (3,3), activation = 'relu', padding = "SAME")(output_from_d)
upsampling_1 = Conv2DTranspose(32, 3, padding='same', activation='relu', strides=(2, 2))(conv1_1)
upsampling_2 = Conv2DTranspose(16, 3, padding='same', activation='relu', strides=(2, 2))(upsampling_1)
decoded_outputs = Conv2DTranspose(1, 3, padding='same', activation='relu')(upsampling_2)

autoencoder = Model(encoder_inputs, decoded_outputs)

from keras import regularizers

# creating autoencoder model

encoder_inputs = Input(shape = (28,28,1))

conv1 = Conv2D(16, (3,3), activation = 'relu', padding = "SAME")(encoder_inputs)

pool1 = MaxPooling2D(pool_size = (2,2), strides = 2)(conv1)

conv2 = Conv2D(32, (3,3), activation = 'relu', padding = "SAME")(pool1)

pool2 = MaxPooling2D(pool_size = (2,2), strides = 2)(conv2)

flat = Flatten()(pool2)

enocder_outputs = Dense(32, activation = 'relu', activity_regularizer=regularizers.l1(10e-5))(flat)

#upsampling in decoder

dense_layer_d = Dense(7*7*32, activation = 'relu')(enocder_outputs)

output_from_d = Reshape((7,7,32))(dense_layer_d)

conv1_1 = Conv2D(32, (3,3), activation = 'relu', padding = "SAME")(output_from_d)

upsampling_1 = Conv2DTranspose(32, 3, padding='same', activation='relu', strides=(2, 2))(conv1_1)

upsampling_2 = Conv2DTranspose(16, 3, padding='same', activation='relu', strides=(2, 2))(upsampling_1)

decoded_outputs = Conv2DTranspose(1, 3, padding='same', activation='relu')(upsampling_2)

autoencoder = Model(encoder_inputs, decoded_outputs)

But, if you want to add sparse constraints by writing your own function, you can follow reference given below.

References: Sparse Autoencoders

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

TheAILearner

Mastering Artificial Intelligence

Author Archives: kang & atul

PEPs

Implementing Capsule Network in Keras

Capsule Networks

Feeding output of a given intermediate layer in Keras as the input to another network

Custom Layers in Keras

Lambda Layer

Custom Class Layer

Saving and Loading models in Keras

Log Transformation

Image Negatives or inverting images using OpenCV

Method 2

Compression of data using Autoencoders

Sparse Autoencoders