Author Archives: kang & atul

PEPs

PEP stands for Python Enhancement Proposals. According to Python.org
 “A PEP is a design document providing information to the Python community, or describing a new feature for Python or its processes or environment. The PEP should provide a concise technical specification of the feature and a rationale for the feature.”

Anyone can submit their own pep which then will be thoroughly peer-reviewed by the community.

PEP numbers like PEP0, PEP8 etc are assigned by the PEP editors, and once assigned are never changed. (See here for complete pep list)

According to PEP 1, there are three different types of PEPs:

  • Standards: Describes a new feature or implementation.
  • Informational: Tells us about general guidelines or information to the community but doesn’t propose a new feature.
  • Process: Describes a process surrounding Python like procedures, guidelines etc. Unlike informational PEPs, you are not free to ignore them.

There are few PEPs which are worth reading like

  • PEP 8: a style guide for python.
  • PEP 20: The Zen of Python (A list of 19 statements that briefly explain the philosophy behind Python).
  • PEP 257: Docstring Convention.

So, if you see any discrepancy write your PEP and wait for its review. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Implementing Capsule Network in Keras

In the last blog we have seen that what is a capsule network and how it can overcome the problems associated with convolutional neural network. In this blog we will implement a capsule network in keras.

You can find full code here.

Here, we will use handwritten digit dataset(MNIST) and train the capsule network to classify the digits. MNIST digit dataset consists of grayscale images of size 28*28.

Capsule Network architecture is somewhat similar to convolutional neural network except capsule layers. We can break the implementation of capsule network into following steps:

  1. Initial convolutional layer
  2. Primary capsule layer
  3. Digit capsule layer
  4. Decoder network
  5. Loss Functions
  6. Training and testing of model

Initial Convolution Layer:

Initially we will use a convolution layer to detect low level features of an image. It will use 256 filters each of size 9*9 with stride 1 and activation function is relu. Input size of image is 28*28, after applying this layer output size will be 20*20*256.

Primary Capsule Layer:

The output from the previous layer is being passed to 256 filters each of size 9*9 with a stride of 2 which will produce an output of size 6*6*256. This output is then reshaped into 8-dimensional vector. So shape will be 6*6*32 capsules each of which will be 8-dimensional. Then it will pass through a non-linear function(squash) so that length of output vector can be maintained between 0 and 1.

Digit Capsule Layer:

Logic and algorithm used for this layer is explained in the previous blog. Here we will see what we need to do in code to implement it. We need to write a custom layer in keras. It will take 1152*8 as its input and produces output of size 10*16, where 10 capsules each represents an output class with 16 dimensional vector. Then each of these 10 capsules are converted into single value to predict the output class using a lambda layer.

Decoder Network:

To further boost the pose parameters learned by the digit capsule layer, we can add decoder network to reconstruct the input image. In this part, decoder network will be fed with an input of size 10*16 (digit capsule layer output) and will reconstruct back the original image of size 28*28. Decoder will consist of 3 dense layer having 512, 1024 and 784 nodes.

During training time input to the decoder is the output from digit capsule layer which is masked with original labels. It means that other vectors except the vector corresponding to correct label will be multiplied with zero. So that decoder can only be trained with correct digit capsule. In test time input to decoder will be the same output from digit capsule layer but masked with highest length vector in that layer. Lets see the code.

Loss Functions:

It uses two loss function one is probabilistic loss function used for classifying digits image and another is reconstruction loss which is mean squared error. Lets see probabilistic loss which is simple to understand once you look at following code.

Training and Testing of model:

Now define our training and testing model and train it on MNIST digit dataset.

In test data set it was able to achieve 99.09% accuracy. Pretty good yeah! Also reconstructed images looks good. Here are the reconstructed images generated by decoder network.

Capsule Network comes with promising results and yet to be explored thoroughly. There are various bits and bytes where it can be explored. Research on a capsule network is still in an early stage but it has given clear indication that it is worth exploring.

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Capsule Networks

Since 2012 with the introduction of AlexNet, convolutional neural networks(CNNs) are being used as sole resource for many wide range image problems. Convolutional neural networks are able to perform really well in the field of image classification, object detection, semantic segmentation and many more.

Image Classification

But are CNNs best solution to solve image problems? Does they translate all features present in the image to predict the output?

Problems with Convolutional Neural Networks:

  1. CNNs uses pooling layers to reduce parameters so that it can speed up computation. In that process it looses some of its useful features information.
  2. CNNs also requires huge amount of dataset to train otherwise it will not give high accuracy in the test dataset.
  3. CNNs basically try to achieve “viewpoint invariance”. It means by changing input a little bit, output will not change. Also, CNNs do not store relative spatial relationship between features.

To solve these problems we need to find a better solution. That is where capsule network comes. A network which has given an early indication that it can solves problem associated with convolution neural networks. Recently, Geoffrey E. Hinton et. al. has published a paper named “Dynamic Routing Between Capsules”, in which they have introduced capsule network and dynamic routing algorithm.

What is a Capsule Network?

A capsule is a group of neurons which uses vectors to represent an object or object part. Length of a vector represents presence of an object and orientation of vector represents its pose(size, position, orientation, etc). Group of these capsules forms a capsule layer and then these layers lead to form a capsule network. It has some advantages over CNN.

  1. Capsule network tries to achieve “equivariance”. It means by changing input a little bit, output will also change but length of vector will remain same which will predict the presence of same object.
  2. Capsule Networks also requires less amount of data for training because it saves spatial relationship between features.
  3. Capsule network do not uses pooling layers which removes the problem of loosing useful features information.

How a Capsule Network works?

Usually in CNNs we deal with layers i.e. one layer passes information to subsequent layer and so on. CapsNet follows same flow as shown below.

Diagram shown above, represents network architecture used in the paper for MNIST dataset. Initial layer uses convolution to get low level features from image and pass them to a primary capsule layer.

A primary capsule layer reshapes output from previous convolution layer into capsules containing vectors of equal dimension. Length of each of these vector represents the probability of presence of an object, that is why we also need to use a non linear function “squashing” to change length of every vector between 0 and 1.

Where Sj is the input vector ||Sj|| is the norm of vector and vj is the output vector. And that will be the output of primary capsule layer. Capsules in the next layer are generated using dynamic routing algorithm. Which follows following algorithm.

Routing Algorithm:

The main feature of routing algorithm is the agreement between capsules. The lower level capsules will send values to higher level capsules if they agree to each other.

Let’s take an example of an image of a face. If there are four capsules in a lower layer each of which representing mouth, nose, left eye, and right eye respectively. And if all of these four agrees to same face position then it will send its values to the output layer capsule regarding there is a presence of a face.

To produce output for the routing capsules( capsules in the higher layer), firstly output from lower layer(u) is multiplied with weight matrix W and then it uses a coupling coefficient C. This C will determine which capsules form lower layer will send its output to which capsule in higher layer.

Coupling coefficient c is learned iteratively. The sum of all the c for a capsule ‘i’ in the lower layer is equal to 1. This maintains the probabilistic nature of vector that its length represents the probability of the presence of an object. C is determined by an applying softmax to weights b. Where initial values of b is taken to zero.

The routing agreement is determined by updating weights b by adding previous b to scalar product between current capsule in higher layer and capsule in lower layer( shown in line 7 in below algorithm)

Further to boost the capsule layer estimation, authors have added a decoder network to it. A decoder network tries to reconstruct the original image using an output of digit capsule layer. It is simply adding some fully connected layer to the output of 16-dimensional capsule layer.

Now we have seen basic concepts of a capsule network. To get more in depth knowledge about capsule network, the best way is to implement its code. Which you can see in the next blog.

The Next Blog : Implementing Capsule Network in Keras

Referenced Research Paper: Dynamic Routing Between Capsules

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Feeding output of a given intermediate layer in Keras as the input to another network

Keras is a high level neural network library used for fast experimentation, user friendliness and easy extensibility. It is highly recommended library for a beginner in neural networks. In this blog we will learn how to use an intermediate layer of a neural network as input to another network.

Sometimes you might get stuck while using an output of an intermediate layer with the errors like ‘graph disconnected‘. Lets see how we can solve this through the code.

First, Lets create an autoencoder model. If you are not aware of what is an autoencoder, you can follow this blog.

In the above code we have created an autoencoder model. At line 9, we have generated encoder outputs. Now if you want to create decoder network from this model with encoder_outputs layer as it input, what should you do? A beginner will do something like this:

But this will throw an error ‘graph disconnected’. This is because dense_layer_d layer is connected to another previous layer and you have disconnected it to directly take this layer as input. To solve this problem you can do something like this:

Earlier we have created a model autoencoder. Now if you want to get its intermediate layer, use following steps:

  1. Find index of the input layer to decoder( in the given autoencoder model it is the 6th layer from last so -6)
  2. Use autoencoder.layers to get that layer.
  3. Iterate through the following layers in the autoencoder model, till the decoder_output layer.
  4. Then create model using decoder_input and last iterated layer.

This will successfully create a decoder model which will take the output of an intermediate layer ‘encoder_outputs’ as its input. And that’s it!!

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Custom Layers in Keras

A model in Keras is composed of layers. There are in-built layers present in Keras which you can directly import like Conv2D, Pool, Flatten, Reshape, etc. But sometimes you need to add your own custom layer. In this blog, we will learn how to add a custom layer in Keras.

There are basically two types of custom layers that you can add in Keras.

Lambda Layer

Lambda layer is useful whenever you need to do some operation on previous layer and do not want to add any trainable weights to it.

Let say you want to add your own activation function (which is not built-in Keras) to a layer. Then you first need to define a function which will take the output from the previous layer as input and apply custom activation function to it. We then pass this function to lambda layer.

Custom Class Layer

Sometimes you want to create your own layer with trainable weights which is not in-built in Keras. In that case you need to create a custom class layer where you need to define following methods.

  1. __init__ method to initialize class variable and super class variables
  2. build method to define weights.
  3. call method where you will perform all your operations.
  4. compute_output_shape method to define output shape of this custom layer

Lets see an example of a custom layer class. Here you only need to focus on the architecture of the class.

In the build method defining self.built = True is necessary. Also, you can see that all logic is written inside call(self, inputs) method. comput_output_shape will define the output shape of the layer.

You can also pass multiple input tensor to this custom layer. The only thing you need to do is, pass multiple inputs using a list.

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Saving and Loading models in Keras

Generally, a deep learning model takes a large amount of time to train, so its better to know how to save trained model. In this blog we will learn about how to save whole keras model i.e. its architecture, weights and optimizer state.

Lets first create a model in Keras. This is a simple autoencoder model. If you need to know more about autoencoders please refer this blog.

Above we have created a Keras model named as “autoencoder“. Now lets see how to save this model.

Saving and loading only architecture of a model

In keras, you can save and load architecture of a model in two formats: JSON or YAML Models generated in these two format are human readable and can be edited if needed.

Saving and Loading Weights of a Keras Model

With model architecture you will also need model weights to predict output from trained model.

Saving and Loading Both Architecture and Weights in one File

This will save following four parameters in “autoencoder_model.h5” file:

  1. Model Architecture
  2. Model Weights
  3. Loss and Optimizer
  4. State of the optimizer allowing to resume training where you left.

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Log Transformation

Log transformation means replacing each pixel value with its logarithm. The general form of log transformation function is

s = T(r) = c*log(1+r)

Where, ‘s’ and ‘r’ are the output and input pixel values and c is the scaling constant represented by the following expression (for 8-bit)

c = 255/(log(1 + max_input_pixel_value))

The value of c is chosen such that we get the maximum output value corresponding to the bit size used. e.g for 8 bit image, c is chosen such that we get max value equal to 255.

For an 8-bit image, log transformation looks like this

Clearly, the low intensity values in the input image are mapped to a wider range of output levels. The opposite is true for the higher values.

Applications:

  • Expands the dark pixels in the image while compressing the brighter pixels
  • Compresses the dynamic range (display of Fourier transform).

Dynamic range refers to the ratio of max and min intensity values. When the dynamic range of the image is greater than that of displaying device(like in Fourier transform), the lower values are suppressed. To overcome this issue, we use log transform. Log transformation first compresses the dynamic range and then upscales the image to a dynamic range of the display device. In this way, lower values are enhanced and thus the image shows significantly more details.

The code below shows how to apply log transform using OpenCV Python

Thus, a logarithmic transform is appropriate when we want to enhance the low pixel values at the expense of loss of information in the high pixel values.

Be careful, if most of the details are present in the high pixel values, then applying the log transform results in the loss of information as shown below

Before
After

In the next blog, we will discuss Power law or Gamma transformation. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Image Negatives or inverting images using OpenCV

Image negatives, most of you might have heard this term, in good old days were used to produce images. Film Photography has not yet become obsolete as some wedding photographers are still shooting film. Because one has to pay for the film rolls and processing fees, most people have now switched to digital.

I recently heard of Foveon X3 direct image sensor which claims to combine the power of digital sensor with the essence of the film. (Check here)

Image negative is produced by subtracting each pixel from the maximum intensity value. e.g. for an 8-bit image, the max intensity value is 28– 1 = 255, thus each pixel is subtracted from 255 to produce the output image.

Thus, the transformation function used in image negative is

s = T(r) = L – 1 – r

Where L-1 is the max intensity value and s, and r are the output and input pixel values respectively.

For grayscale images, light areas appear dark and vice versa. For color images, colors are replaced by their complementary colors. Thus, red areas appear cyan, greens appear magenta, and blues appear yellow, and vice versa.

The output looks like this

Method 2

OpenCV provides a built-in function cv2.bitwise_not() that inverts every bit of an array. This takes as input the original image and outputs the inverted image. Below is the code for this.

There is a long debate going on whether black on white or white on black is better. To my knowledge, Image negative favors black on white thus it is suited for enhancing the white or gray information embedded in the dark regions of the image especially when the black areas are dominant in size.

Application: In grayscale images, when the background is black, the foreground gray levels are not clearly visible. So, converting background to white, the gray levels now become more visible.

In the next blog, we will discuss Log transformations in detail. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Compression of data using Autoencoders

In the last blog, we discussed what autoencoders are. In this blog, we will learn, how autoencoders can be used to compress data and reconstruct back the original data.

Here I have used MNIST dataset. First, I have downloaded MNIST dataset which is having digits images(0 to 9), a total of size 45 MB. Let’s, see the code to download data using python.

Since we want to compress the dataset and reconstruct back it into original data, first we have to create a convolutional autoencoder. Let’s see code:

From this autoencoder model, I have created encoder and decoder model. Encoder model will compress the data and decoder model will be used while reconstructing original data. Then trained the auotoencoder model.

Using encoder model we can save compressed data into a text file. Which having size of 18 MB( Much less then original size 45 MB).

Now next thing is how we can reconstruct this compressed data when original data is needed. The simple solution is, we can save our decoder model and its weight which will be used further to reconstruct this compressed data. Let’s save decoder model and it’s weights.

Finally we are having our compressed data and decoder model. Let’s see code how we can simply reconstruct back using these two.

Above are our output from decoder model.

It looks fascinating to compress data to less size and get same data back when we need, but there are some real problem with this method.

The problem is autoencoders can not generalize. Autoencoders can only reconstruct images for which these are trained. But with the advancement in deep learning those days are not far away when you will use this type compression using deep learning.

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Sparse Autoencoders

In the last blog we have seen autoencoders and its applications. In this blog we will learn one of its variant, sparse autoencoders.

In every autoencoder, we try to learn compressed representation of the input. Let’s take an example of a simple autoencoder having input vector dimension of 1000, compressed into 500 hidden units and reconstructed back into 1000 outputs. The hidden units will learn correlated features present in the input. But what if input features are completely random? Then it will we difficult for hidden units to learn interesting structure present in data. In that situation what we can do is increase the number of hidden units and add some sparsity constraints. Now the question is what are sparsity constraints?

When sparsity constraints added to a hidden unit, it only activates some units (having large activation values) and makes rest to zero. So, even if we are having a large number of hidden units( as in the above example), it will only fire some hidden units and learn useful structure present in the data.

The simplest implementation of sparsity constraints can be done in keras. You can simple add activity_regularizer to a layer (see line 11) and it will do the rest.

But, if you want to add sparse constraints by writing your own function, you can follow reference given below.

References: Sparse Autoencoders

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.