Tag Archives: tensorflow

Monitoring Training in Keras: Callbacks

Keras is a high level API, can run on top of Tensorflow, CNTK and Theano. Keras is preferable because it is easy and fast to learn. In this blog we will learn a set of functions named as callbacks, used during training in Keras.

Callbacks provides some advantages over normal training in keras. Here I will explain the important ones.

  • Callback can terminate a training when a Nan loss occurs.
  • Callback can save the model after every epoch, also you can save the best model.
  • Early Stopping: Callback can stop training when accuracy stops improving.

Terminate the training when Nan loss occurs

Let’s see the code how to terminate when Nan loss occurs while training:

Saving Model using Callbacks

To save model after every epoch in keras, we need to import ModelCheckpoint from keras.callbacks. Let’s see the below code which will save the model if validation loss decreases.

In the above code first we have created a ModelCheckpoint object by passing its required parameters.

  • filepath” defines the path where all checkpoints will be saved. If you want to save only the best model, then directly pass filepath with name “best_model.hdf5” which will overwrite the previous saved checkpoints.
  • monitor” will decide which quantity you want to monitor while training.
  • save_best_only” only saves if validation loss decreases.
  • mode with {auto, min, max} when chosen max, will stop training when monitored quantity stops increasing.

Then finally make a callback list and pass it into model.fit() with parameter callbacks.

Early Stopping

Callbacks can stop training when a monitored quantity has stopped improving. Lets see how:

  • min_delta: It is the minimum quantity which will be taken for improvement to be conceded.
  • patience: after this number of epochs if training does not improve, it will stop.
  • mode: in auto mode, the direction will be decided by monitored quantity.
  • baseline” the baseline value over which no improvement will stop the training.

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Calculating Screen Time of an Actor using Deep Learning

Screen time of an actor in a movie or an episode is very important. Many actors get paid according to their total screen time. Moreover, we also want to know how much time our favorite character acted on screen. So, have you ever wondered how can you calculate the total screen time of an actor? One of the plausible answer is with deep learning.

With the advancement of deep learning now its possible to solve various difficult problems. In this blog, we will learn how to use transfer learning and image classification concepts of deep learning to calculate the screen time of an actor.

To solve any problem with deep learning, the first requirement is the data. For this tutorial, we will use a video clip from the famous TV show “Friends”. We are going to calculate the screen time of my favorite character “Ross”.

Creating Dataset

First, we need to get a video. To do this I have downloaded a video from YouTube using pytube library. For more understanding of pytube, you can follow this blog or use the following code to get started.

Now we have our data in the form of a video which is nothing but a group of frames( images). Since we are going to solve this problem using image classification, we need to extract the images from this video. For this task, I have used OpenCV as shown below

The video is now converted into individual frames. In this problem, there is only one class, either “Ross” or “No Ross”. To create a dataset, we need to separate images according to these two manually. For this, I have created a folder named “data” which is having two sub-folder “ross” and “no_ross”. Then manually added images to these two sub-folders. After creating dataset we are ready to dive into the code and concepts.

Input Data and Preprocessing

We are having data in the form of images. To prepare this data for input to our neural network, we need to do some preprocessing with the following steps:

  • Read all images one by one using openCV
  • Resize each image to (224, 224, 3) for the input to the model
  • Divide the data by 255 to make input features to neural network in the same range
  • Append to corresponding class

Transfer Learning

Since we have only 6814 images, so it will be difficult to train a neural network with this little dataset. Here comes the concept of transfer learning.

With the help of transfer learning, we can use features generated by a model trained on a large dataset into our model. Here we will use VGG16 model trained on “imagenet” dataset. For this, we are using tensorflow high-level API Keras. With keras, you can directly import VGG16 model as shown in the code below.

VGG16 model trained with imagenet dataset predicts on lots of classes, but in this problem, we are only having one class, either “Ross” or “No Ross”. That’s why above we are using include_top = False, which signifies that we are not including fully connected layers from the VGG16 model. Now we will pass our input data to vgg_model and generate the features.

Network Architectures

Since we are not including fully connected layers from VGG16 model, we need to create a model with some fully connected layers and an output layer with 1 class, either “Ross” or “No Ross”. Output features from VGG16 model will be having shape 7*7*512, which will be input shape for our model. Here I am also using dropout layer to make model less over-fit. Let’s see the code:

Splitting Data into Train and Validation

Now we have input features from VGG16 model and our own network architecture defined above. Next thing is to train this neural network. But we are lacking our validation data. We are having 6814 images, so we will split this into 5000 training images and 1814 validation images.

According to our created class 1, class 2, training and validation data, we will create our output y labels.

Training the Network

All set, we are ready to train our model. Here, we will use stochastic gradient descent as an optimizer and binary cross-entropy as our loss function. We are also going to save our checkpoint for the best model according to it’s validation dataset accuracy.

I am using batch size of 64 and 10 epochs to train.

Training and validation accuracy looks quite pleasing. Now let’s calculate screen time of “Ross”.

Calculating Screen Time

To test our trained model and calculate the screen time, I have downloaded another “friends” video clip from YouTube and extracted images. To calculate the screen time, first I have used the trained model to predict each image to find out which class it belongs, either “Ross” or “No Ross”. Since video is made up of 24 frames per second, we will count the number of frames which has been predicted for having “Ross” in it and then divide it by 24 to count the number of seconds “Ross” was on screen.

This test video clip is made up of 24 frames per second and number of images predicted for having “Ross” in it are 4715. So the screen time for Ross will be 4715/24 = 196 seconds.

Summary

We can see good accuracy on train and validation dataset but when I tested it on test dataset, the accuracy was about 65%. The one reason that I figured out is less training data. If you can get more data then accuracy can be higher. Another reason can be co-variance shift which means the test dataset is quite different from training dataset due to different video quality.

This type of technique can be very helpful in calculating screen time of a particular character.

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Snake Game Using Tensorflow Object Detection API

Here, we will learn how to use tensorflow object detection API with the computer’s webcam to play a snake game. We will use hand gestures instead of the keyboard.

We will use following steps to play snake game using tensorflow object detection API:

  1. Generate dataset.
  2. Convert train and test datasets into tfrecord format.
  3. Train a pre-trained model using generated data.
  4. Integrate trained model with snake game.
  5. Play the snake game using your own hand gestures.

In this blog, we will cover only the first step i.e. how to create your own training data and the rest steps will be covered in the subsequent blogs.

You can find code here.

Generate Dataset

A snake game problem generally contains four directions to move i.e. up, down, right and left. For each of the four directions, we need to generate at least 100 images per direction. You can use your phone or laptop camera to do this. Try to generate images with a different background for better generalization. Below are some examples of images of hand gestures.

Hand Gestures

Now we have our captured images of hand gestures. The next thing is to annotate these images according to their classes. Which means we need to create rectangular boxes around hand gestures and label them appropriately. Don’t worry there is a tool named LabelImg, which is highly helpful to annotate images to create training and test dataset. To start with LabelImg you can follow there GitHub link. The start screen of Labellmg would look like this.

At the left side of the screen, you can find various options. Click on Open dir and choose the input image folder. Then click on Change save dir and select output folder where generated XML files will be saved. This XML file will contain coordinates of your generated rectangular box in the image, something like this.

To create a rectangular box in the image using Labellmg, you just need to press ‘W’ and then create a box and save it. You can create one or multiple boxes in one image as shown in the figure below. Repeat this for all the images.

Now we have images and their corresponding XML files. Then we will separate this dataset into training and testing in the 90/10 ratio. To do this we need to put 90% of images of each class ‘up’, ‘right’, ‘left’ and ‘down’ and their corresponding XML files in one folder and other 10% in other folder.

That’s all for creating dataset, in the next blog we will see how to create TFRecord files from these datasets which will be used for training of the model.

Next Blog: Snake Game Using Tensorflow Object Detection API – Part II

 Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Implementing Capsule Network in Keras

In the last blog we have seen that what is a capsule network and how it can overcome the problems associated with convolutional neural network. In this blog we will implement a capsule network in keras.

You can find full code here.

Here, we will use handwritten digit dataset(MNIST) and train the capsule network to classify the digits. MNIST digit dataset consists of grayscale images of size 28*28.

Capsule Network architecture is somewhat similar to convolutional neural network except capsule layers. We can break the implementation of capsule network into following steps:

  1. Initial convolutional layer
  2. Primary capsule layer
  3. Digit capsule layer
  4. Decoder network
  5. Loss Functions
  6. Training and testing of model

Initial Convolution Layer:

Initially we will use a convolution layer to detect low level features of an image. It will use 256 filters each of size 9*9 with stride 1 and activation function is relu. Input size of image is 28*28, after applying this layer output size will be 20*20*256.

Primary Capsule Layer:

The output from the previous layer is being passed to 256 filters each of size 9*9 with a stride of 2 which will produce an output of size 6*6*256. This output is then reshaped into 8-dimensional vector. So shape will be 6*6*32 capsules each of which will be 8-dimensional. Then it will pass through a non-linear function(squash) so that length of output vector can be maintained between 0 and 1.

Digit Capsule Layer:

Logic and algorithm used for this layer is explained in the previous blog. Here we will see what we need to do in code to implement it. We need to write a custom layer in keras. It will take 1152*8 as its input and produces output of size 10*16, where 10 capsules each represents an output class with 16 dimensional vector. Then each of these 10 capsules are converted into single value to predict the output class using a lambda layer.

Decoder Network:

To further boost the pose parameters learned by the digit capsule layer, we can add decoder network to reconstruct the input image. In this part, decoder network will be fed with an input of size 10*16 (digit capsule layer output) and will reconstruct back the original image of size 28*28. Decoder will consist of 3 dense layer having 512, 1024 and 784 nodes.

During training time input to the decoder is the output from digit capsule layer which is masked with original labels. It means that other vectors except the vector corresponding to correct label will be multiplied with zero. So that decoder can only be trained with correct digit capsule. In test time input to decoder will be the same output from digit capsule layer but masked with highest length vector in that layer. Lets see the code.

Loss Functions:

It uses two loss function one is probabilistic loss function used for classifying digits image and another is reconstruction loss which is mean squared error. Lets see probabilistic loss which is simple to understand once you look at following code.

Training and Testing of model:

Now define our training and testing model and train it on MNIST digit dataset.

In test data set it was able to achieve 99.09% accuracy. Pretty good yeah! Also reconstructed images looks good. Here are the reconstructed images generated by decoder network.

Capsule Network comes with promising results and yet to be explored thoroughly. There are various bits and bytes where it can be explored. Research on a capsule network is still in an early stage but it has given clear indication that it is worth exploring.

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Feeding output of a given intermediate layer in Keras as the input to another network

Keras is a high level neural network library used for fast experimentation, user friendliness and easy extensibility. It is highly recommended library for a beginner in neural networks. In this blog we will learn how to use an intermediate layer of a neural network as input to another network.

Sometimes you might get stuck while using an output of an intermediate layer with the errors like ‘graph disconnected‘. Lets see how we can solve this through the code.

First, Lets create an autoencoder model. If you are not aware of what is an autoencoder, you can follow this blog.

In the above code we have created an autoencoder model. At line 9, we have generated encoder outputs. Now if you want to create decoder network from this model with encoder_outputs layer as it input, what should you do? A beginner will do something like this:

But this will throw an error ‘graph disconnected’. This is because dense_layer_d layer is connected to another previous layer and you have disconnected it to directly take this layer as input. To solve this problem you can do something like this:

Earlier we have created a model autoencoder. Now if you want to get its intermediate layer, use following steps:

  1. Find index of the input layer to decoder( in the given autoencoder model it is the 6th layer from last so -6)
  2. Use autoencoder.layers to get that layer.
  3. Iterate through the following layers in the autoencoder model, till the decoder_output layer.
  4. Then create model using decoder_input and last iterated layer.

This will successfully create a decoder model which will take the output of an intermediate layer ‘encoder_outputs’ as its input. And that’s it!!

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Custom Layers in Keras

A model in Keras is composed of layers. There are in-built layers present in Keras which you can directly import like Conv2D, Pool, Flatten, Reshape, etc. But sometimes you need to add your own custom layer. In this blog, we will learn how to add a custom layer in Keras.

There are basically two types of custom layers that you can add in Keras.

Lambda Layer

Lambda layer is useful whenever you need to do some operation on previous layer and do not want to add any trainable weights to it.

Let say you want to add your own activation function (which is not built-in Keras) to a layer. Then you first need to define a function which will take the output from the previous layer as input and apply custom activation function to it. We then pass this function to lambda layer.

Custom Class Layer

Sometimes you want to create your own layer with trainable weights which is not in-built in Keras. In that case you need to create a custom class layer where you need to define following methods.

  1. __init__ method to initialize class variable and super class variables
  2. build method to define weights.
  3. call method where you will perform all your operations.
  4. compute_output_shape method to define output shape of this custom layer

Lets see an example of a custom layer class. Here you only need to focus on the architecture of the class.

In the build method defining self.built = True is necessary. Also, you can see that all logic is written inside call(self, inputs) method. comput_output_shape will define the output shape of the layer.

You can also pass multiple input tensor to this custom layer. The only thing you need to do is, pass multiple inputs using a list.

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Saving and Loading models in Keras

Generally, a deep learning model takes a large amount of time to train, so its better to know how to save trained model. In this blog we will learn about how to save whole keras model i.e. its architecture, weights and optimizer state.

Lets first create a model in Keras. This is a simple autoencoder model. If you need to know more about autoencoders please refer this blog.

Above we have created a Keras model named as “autoencoder“. Now lets see how to save this model.

Saving and loading only architecture of a model

In keras, you can save and load architecture of a model in two formats: JSON or YAML Models generated in these two format are human readable and can be edited if needed.

Saving and Loading Weights of a Keras Model

With model architecture you will also need model weights to predict output from trained model.

Saving and Loading Both Architecture and Weights in one File

This will save following four parameters in “autoencoder_model.h5” file:

  1. Model Architecture
  2. Model Weights
  3. Loss and Optimizer
  4. State of the optimizer allowing to resume training where you left.

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Neural Arithmetic Logic Units

In this tutorial, you will learn about neural arithmetic logic units (NALU).

You can see full TensorFlow implementation of neural arithmetic logic units in my GitHub Repository.

In Present time you can see neural network has wide application area from simple classification problem to complex self-driving cars problem. And neural network is doing very well in these fields. But can you think a neural network can’t count. Even animals as simple as bees can do that.

The problem is that a neural network can not perform numerical extrapolation outside training data. It will not be able to learn even a scalar identity function outside it’s training set. Recently DeepMind researchers released a paper in which they have generated a function which tries to solve this problem.

Failure Of Neural Network Learning a scalar identity function

The problem of neural nets not able to learn identity relations is not new. But in paper they tried to show this using an example.

They used an autoencoder of 3 layers each of 8 units and tried to predict identity relations. As an example if input is given as 4 then output should also be 4. They used different non-linear functions in this network like sigmoid and tanh but they all fail to extrapolate identity relation outside training data set.

They also saw that some highly linear function like PReLU are able to reduce the error but you can see even neural nets have function that are capable of extrapolation, they fail to do it.

To solve this problem they proposed two models:

  1. NAC (Neural Accumulator)
  2. NALU (Neural Arithmetic Logic Units)

NAC (Neural Accumulator)

Neural accumulator is able to solve problem of addition and subtraction.

NAC is a special case where transformation matrix of a layer consists of [-1, 0,1]. This makes output from W as addition or subtraction of rows of input vector rather than arbitrary re scaling produced by non-linear functions. As an example if our input layer consists of X1  and X2 then output from NAC will be linear combinations of input vectors. It will make numbers consistent throughout the model no matter haw many operations are applied there.

Since W is having hard constraints that every element of W should be one of {-1, 0, 1}. It makes learning difficult. The problem of difficult learning is that hard constraints creates difficulty in updating weights during back propagation. To solve this they proposed a continuous and differentiable parameterization of W.

 

w_hat and m_hat are randomly initialized weights and be convenient to learn with gradient descent. This guarantees that W will be in range of {-1, 1} and be close to {-1 , 0 ,1}. Here ” * ” means element-wise multiplication.

NALU (Neural Arithmetic Logic Units)

NAC is able to solve problem of addition/subtraction but also to solve problem of multiplication/division they came up with NALU which consists of two NAC sub cells, one capable of addition/subtraction and other for multiplication/division.

It consists of these five equations:

Where,

  1.  w_hat, m_hat and G and randomly initialized weights,
  2.  ϵ is used to get away with problem of log(0),
  3.  x and y are input and output layer respectively,
  4. g is the gate which will be between 0 and 1.

Here the concept of gate is being added as variable ” g ” , such that if output value is applied with g = 1(on) then multiply/divide sub cell is 0 (off) and vice-versa.

For addition and subtraction (a = matmul(x, W) ), is identical to original NAC while for multiply/divide NAC operates in log space and capable of learning to multiply and divide (m = exp(matmul(W , log(|x|+ ϵ ) ) ).

So, this NALU is capable of both extrapolation and interpolation.

Experiments performed with NAC and NALU models

In paper they have also applied these concepts over different task to see ability of NAC and NALU. They found NALU to be very useful in different problems like:

  1. Learning tasks using different arithmetic functions( x+y, x-y, x-y+x, x*y, etc)
  2. Counting Task using recurrent network in which images of different digits are being fed to model and output should count no of each different type of digits.
  3. Language to number translation task in which expression like ” five hundred fifteen ” is being fed to network and output should return ” 515 “. Here NALU is applied with a LSTM model in output layer.
  4. NALU is also used with reinforcement learning to track time in a grid-world environment.

Summary

We have seen that NAC and NALU can be applied to overcome problem of failure of numerical representation to generalize outside the range observed in training data set. If you have gone through this blog, you have seen that this NAC and NALU concept is very easy to grasp and apply. However, it can not be said that NALU will be perfect for every task, so we have to see where it is giving good results.

Referenced Research Paper : Neural Arithmetic Logic Units

 

 

 

 

 

Snake Game with Deep Learning

Developing a neural network to play a snake game usually consists of three steps.

  1. Training data generation
  2. Training neural network
  3. Testing

The full code can be found here

In this tutorial, I will guide you to generate training data. To do this, first, we need to develop a snake game for which you can follow this blog.

Training data consists of inputs and corresponding outputs. Here, I have used the following inputs and outputs.

Input is comprised of 7 nodes:

  1. Is left blocked or is there any obstacle in left ( 1 or 0)
  2. is front blocked or is there any obstacle in front (1 or 0)
  3. Is right blocked  or is there any obstacle in right(1 or 0)
  4. Apple direction vector from snake (X)
  5. Apple direction vector from snake (Y)
  6. Snake’s current direction vector (X)
  7. Snake’s current direction vector (Y)

our input data will look like this:

The output is comprised of 3 node:

  1.  [1,0,0] will move snake left
  2.  [0,1,0] will continue snake in same direction
  3.  [0,0,1] will move snake right

Now the big question, how to generate this data? You can sit and play as many games as you can, but it is always good when you can generate data automatically. Let’s see how to do this.

Generating Training Data

Here I have generated training data automatically. To do this I have used angle between snake and apple. On the basis of that angle, I have decided in which direction snake should move. First, let’s calculate these.

Calculating angle b/w snake and apple:

To calculate the angle between snake and apple we only require two parameters, snake position and apple position.

In the following code, I have first calculated the snake’s current direction vector and Apple’s direction from the snake’s current position. Snake direction vector can be calculated by simply subtracting 0th index of the snake’s list from the 1st index. And to calculate apple direction from the snake, just subtract 0th index of snake’s list from Apple’s position.

Then normalize these direction vectors and calculate the angle with the help of the math library. The code is as follows:

After calculating the angle, next thing is to decide in which direction snake should move.

Calculating direction according to the angle:

If above-calculated angle > 0, this means Apple is on the right side of the snake. So snake should move to the right. For  < 0, move left and =0 means continue in same direction. I have used 1, – 1 and 0 for the right, left and front respectively.

I have used the following steps to get the correct button direction (up, down, right, left or 3, 2, 1, 0 respectively) for the next step of the snake.

  1. First, I have calculated the snake’s current direction.
  2. Then to turn the snake to the left or right direction, I have calculated left direction vector or right direction vector from snake’s current direction vector.
  3. Then I have converted the above-calculated direction vector into the button direction.

Now, for every step, angle and corresponding next direction are calculated and snake moves according to that. And for each step inputs and outputs are calculated which are appended to a list of training data.To generate training data, we need to keep a record of 7 inputs and 3 outputs for every step the snake takes. First, let’s see how I have calculated the inputs for every step the snake takes.

  1. To check if the direction is blocked, we look one step ahead in each direction.
  2. Snake direction vector = Snake’s Head (0th index) – Snake’s 1st index
  3. Apple direction from the snake = Apple’s position – Snake’s head position (See the figure below)

For every step, the output is generated by first calculating the direction for the given snake and apple position, using angle between them. Now, we need to convert our directions( -1, 0 or 1 ) to output(Y), a one hot vector. For every predicted direction we need to see that if that direction is blocked or not and according to that create output (Y) for training data. The code given below seems to be a bit longer but it calculates our training data output (Y).

Here, I have used 1000 games for generating training data, each of which consists of 2000 steps. For every game, I have re-initialized snake position, apple position, and score. Then, created two empty lists, one for input training data(X) and another output training data(Y), those will contain our whole training data.The code is as follows:

You might have got some feeling about the training data generation for the snake game with deep learning. In the next blog, we will use this data to train and test our neural network. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Snake Game with Deep Learning Part-2

This is the second part of the snake game with deep learning series. In my previous blog, we have seen that how to generate training data for the neural network. In this tutorial, we will see training and testing of the neural network from generated training data.

The full code can be found here.

Our neural network is comprised of 7 nodes in the input layer, 3 node in the final layer and some hidden layers.

Network Architecture:

Now, it’s time to choose hidden layers and corresponding hyperparameters. Its always been difficult to find the perfect neural network architecture. There are some algorithms that can help to find the best network architecture for a neural network like a genetic algorithm, NAS, autoML etc. I have explained neural architecture search using the genetic algorithm in this blog.

In this blog, I have used hit and trial method to find network architecture. After some hit and trials, I have found a workable architecture, which consists of 2 hidden layers one of 9 units and other of 15 units. For the hidden layer, I have used the non-linear function ‘relu’ and for the output layer, I have used ‘softmax’.

You can use different libraries to train this model like keras, tflearn, etc. Here I have used keras. Our network architecture is as follows:

Train Neural Network

Our model is prepared, now it’s time to train this. For training, we first need to compile this model then call a method model.fit() which will do the rest. Since our training data is a list, we first need to change it into numpy array and then reshape it. The reason for this is, a sequential model from keras expects numpy array or sparse matrix of shape [n_samples,n_features].

Now, our model is trained with generated training data. Next thing is to test it and see how much is learned. 

Test Snake Game

Now it’s time to test our trained snake. To predict the direction we have fed our model with input values. Then used the predicted direction(Left, Straight or Front) to take the next step in our test games. For the new position, again predict the direction and move the snake. This continues until the snake dies or steps are over.

At last, we have calculated the maximum and average score for all the games in our test set.

Now let see how neural network plays snake game.

Summary

I have used 1000 training games and 2000 steps per game. From this, I have generated 1633235 training examples. Then I have tested it on 1000 games and 2000 steps per game. Got the highest score of 61 and got an average score of 23.091. This score can vary since we are using random positions for food and also no of steps are fixed. You can also vary your clock speed as per your need.
You can try with the different number of games but then you have to change your network architecture in order to prevent the model from biasing and overfitting.

Now you might have got some feeling about how neural network plays a snake game. In the next blog, we will use a neural network trained with a genetic algorithm to play snake game. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.