Tag Archives: keras

Keras Callbacks – History

In neural networks, the best idea for debugging is to see the relationship between the cost and the number of iterations. This not only ensures that the optimizer is working properly but can also be very useful in the indication of overfitting. Moreover, we can also debug the learning rate based on this relationship. Thus, one should always keep a track on the loss and the accuracy metrics while training a neural network.

Fortunately, in Keras, we don’t need to write a single extra line of code to store all these values. Keras automatically keeps the record of all the events for each epoch. This includes loss and accuracy metrics for both training and validation sets (if used). This is done using the History callback which is automatically applied to every Keras model. This callback records all the events into a History object that gets returned by the fit() method.

How does this work?

First, at the onset of training, this creates an empty dictionary to store all the events. Then at every epoch end, all the events are appended into the dictionary. Below is the code for this taken from the Keras GitHub.

How to use this?

Since all the saved records are returned by the fit() method, we can simply store all the events in any variable. Here, I’ve used “record” as the variable name.

Now, using this record object, we can retrieve any information about the training process. For instance, “record.epoch” returns the list of epochs.

record.history” returns the dictionary containing the event names as the dictionary keys and their values at each epoch in a list.

You can retrieve all the event names using the following command.

You can also get the information about the parameters used while fitting the model. This can be done using the following command.

Not only this, but one can also check which data is used as the validation data using the following command.

These are just a few of functionalities available under the History callback. You can check more of these at Keras GitHub.

Plot the training history

Since all the events are stored in a dictionary, one can easily plot these using any plotting library. Here, I’m using Matplotlib. Below is the code for plotting the loss curves for both training and validation sets.

Similarly, one can plot the accuracy plots. That’s all for History callback. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Keras Callbacks – EarlyStopping

One common problem that we face while training a neural network is of overfitting. This refers to a situation where the model fails to generalize. In other words, the model performs poorly on the test/validation set as compared to the training set. Take a look at the plot below.

Clearly, after ‘t’ epochs, the model starts overfitting. This is clear by the increasing gap between the train and the validation error in the above plot. Wouldn’t it be nice if we stop the training where the gap starts increasing? This will help prevent the model from overfitting. This method is known as Early Stopping. Some of the pros of using this method are

  • Prevents the model from overfitting
  • Parameter-free unlike other regularization techniques like L2 etc.
  • Removes the need to manually set the number of epochs. Because now the model will automatically stop training when the monitored quantity stops improving.

Fortunately, in Keras, this is done using the EarlyStopping callback. So, let’s first discuss its Keras API and then we will learn how to use this.

Keras API

In this, you first need to provide which quantity to monitor using the “monitor” argument. This can take a value from ‘loss’, ‘acc’, ‘val_loss’, ‘val_acc’ or ‘val_metric’ where metric is the name of the metric used. For instance, if the metric is set to ‘mse’ then pass ‘val_mse’.

After setting the monitored quantity, you need to decide whether you want to minimize or maximize it. For instance, we want to minimize loss and maximize accuracy. This can be done using the “mode” argument. This can take value from [‘min‘, ‘max‘, ‘auto‘]. Default is the ‘auto’ mode. In ‘auto’ mode, this automatically infers whether to maximize or minimize depending upon the monitored quantity name.

This stops training whenever the monitored quantity stops improving. By default, any fractional change is considered as an improvement. For instance, if ‘val_acc’ increases from 90% to 90.0001% this is also considered as an improvement. The meaning of improvement may vary from one application to another. So, here we have an argument “min_delta“. Using this we can set the minimum change in the monitored quantity to qualify as an improvement. For instance, if min_delta=1, so all the absolute changes of less than 1, will count as no improvement.

Note: This difference is calculated as the current monitored quantity value minus the best-monitored quantity value until now.

As we already know that neural networks mostly face the problem of plateaus. So monitored quantity may not show improvement for some time and then improve afterward. So, it’s better to wait for a few epochs before making the final decision to stop the training process. This can be done using the “patience” argument. For instance, a patience=3 means if the monitored quantity doesn’t improve for 3 epochs, stop the training process.

The model will stop training some epochs (specified by the “patience” argument) after the best-monitored quantity value. So, the weights you will get are not the best weights. To retrieve the best weights, set the “restore_best_weights” argument to True.

Sometimes for a task, we have a baseline in our mind that at least I should get a minimum of 75% accuracy within 5 epochs. If you are not getting this, there is no point training the model any further. Then you should try changing the hyperparameters and again retrain the model. In this, you can set the baseline using the “baseline” argument. If the monitored quantity minus the min_delta is not surpassing the baseline within the epochs specified by the patience argument, then the training process is stopped.

For instance, below is an example where the baseline is set to 98%.

The training process stops because of the val_acc – min_delta < baseline for the patience interval (3 epochs). This is shown below.

After surpassing the baseline, the Early Stopping callback will work as normal i.e. stop training when the monitored quantity stops improving.

Note: If you are not sure about the baseline in your task, just set this argument to None.

I hope you get some feeling about the EarlyStopping callback. Now let’s see how to use this.

How to use this?

Firstly, you need to create an instance of the “EarlyStopping” class as shown below.

Then pass this instance in the list while fitting the model.

That’s all for Early Stopping. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

ImageDataGenerator – apply_transform method

In this blog, we will discuss ImageDataGenerator “apply_transform” method. Using this method, you can apply any desired transformations to an image. You can find its use in the ImageDataGenerator “flow” method. First of all, let’s discuss its Keras API.

Keras API

This applies transformations to x (3D tensor) according to the transform parameters specified.

The “transform_parameters” is a dictionary specifying the set of transformations to be applied. Only the following transformations are available

Let’s discuss these in detail.

theta: Rotation angle in degrees. Below is an example that rotates the image by 40 degrees.

tx and ty: These are the shifts in the vertical and the horizontal directions respectively. For instance, tx=20 will shift the image vertically by 20 pixels.

In this, first of all, the translation matrix is calculated. Then affine transformation is applied using the “scipy.ndimage” affine_transformation method.

ty = 20

zx and zy: This zooms the image in the vertical and horizontal directions respectively. If less than 1, the image is zoomed in otherwise zoomed out.

Note: -ve values of zx and zy results in flipping the image in vertical and horizontal directions respectively. For instance, zx=-1 will flip the image vertically.

flip_horizontal and flip_vertical: This flips the image horizontally and vertically. For instance, below is the code for flipping the image horizontally,

channel_shift_intensity: This shifts the channel values by the amount specified. The following code sums up how it works

brightness: This controls the brightness of the image. An enhancement factor of 0.0 gives a black image. A factor of 1.0 gives the original image.

Hope you understand all the arguments. Now, let’s see how to use this.

How to use this?

Because “apply_transform” is a method inside the ImageDataGenerator class. Thus, one first need to create the instance of this class and then apply this method as shown below

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

ImageDataGenerator – get_random_transform method

In the previous blog, we discussed how to generate batches of augmented data using the flow method. We also learned that the key ingredient in the flow method is the “get_random_transform” method. This generates random parameters for a transformation. So, in this blog, let’s discuss this method in detail.

Keras API

Now, let’s see how this generates random parameters for transformations by just using the image shape information.

How this works?

This borrows the parameters from the ImageDataGenerator class. For instance, if we define rotation range in the ImageDataGenerator class

then the random parameters for this is obtained as

Thus, whenever you generate examples, theta is obtained from the uniform distribution, specified according to the parameters provided in the ImageDataGenerator class.

Similarly, for every transformation provided in the ImageDataGenerator class, we can obtain the random parameters. For more details, refer to the Keras GitHub.

How to use this?

To use this, you first need to provide the transformations in the ImageDataGenerator class. For instance, if I just want to rotate the image, then first specify the parameters as

Now, to get the random parameters, call the “get_random_transform” method as

This outputs the following parameters dictionary as

See only the transformations specified in the ImageDataGenerator class i.e. theta value is changed. Rest all values are the default.

So, this way one can generate random parameters for transformations. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

ImageDataGenerator – standardize method

In this blog, we will discuss ImageDataGenerator “standardize” method. This method performs in-place normalization to the batch of inputs. As already discussed, this is an important step in the flow method or data augmentation. So, let’s discuss it in detail.

Keras API

Here, x is the batch of inputs. This method returns the normalized inputs. Note that x is changed in-place. If you don’t want to change the inputs in-place, pass a copy of the input to this method.

How this works?

While performing data augmentation with ImageDataGenerator, we discussed different normalization techniques. These techniques include centering the entire distribution or a sample, rescaling the input, performing zca whitening, etc. Behind the scenes, these are implemented by the “standardize” method. Let’s see how.

For instance, we want to rescale the input by 1/255. So, first of all we will create an ImageDataGenerator instance as shown below

Then for data augmentation, we will use the flow method as

Thus, the training_generator will yield batches of augmented images. That’s all we usually do.

As already discussed in this blog, the flow method consists of three steps, of which the last step is the “standardize” method. All the normalization work in the ImageDataGenerator class is handled by this method.

Now, coming back to the above example, the “standardize” method will first check whether you want to rescale or not. If yes, then this will change the input in-place as shown below.

Similarly, this method performs featurewise_center or samplewise_center or any other normalization. For more details, refer to Keras Github.

How to use this?

First of all, create an ImageDataGenerator instance with the desired transformations. Then apply the “standardize” method as shown below.

Note: The standardize method only supports transformations that perform normalization such as featurewise_center, rescale, etc. Otherwise, this returns the same image or batch of inputs.

What does in-place means?

As already discussed, this method normalizes the inputs in-place. This is exactly what in-place operators in Python do. Let’s take an example to understand what does in-place means.

For instance, let’s rescale an image of all ones by 2. After the “standardize” method, see how the mean of the “images” change.

That’s all for “standardize” method. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

ImageDataGenerator – random_transform method

In the previous blog, we discussed how to generate random parameters for a transformation. In this blog, we will discuss how to apply a random transformation to an image.

Keras API

This function returns a randomly transformed version of the input image x.

How this method works?

  • First of all, this generates random parameters for a transformation using the “get_random_transform” method. For more details, refer to this blog.
  • Then the image is transformed according to the parameters (generated above) using the “apply_transform” method. For more details, refer to this blog.

Below is the code for this (taken from Keras)

How to use this?

To use this, you first need to provide the desired transformations in the ImageDataGenerator class. For instance, let’s say we just want to zoom the image. Firstly, we specify the parameters in the ImageDataGenerator class.

Then we apply the random_transform method.

Similarly, you can apply any random transformation to the image. Just specify the transformations in the ImageDataGenerator class. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

ImageDataGenerator – fit method

In the previous blog, we discussed how to perform data augmentation using ImageDataGenerator. In that, we saw that some transformations require statistics of the entire dataset. These transformations include featurewise_center, featurewise_std_normalization and zca_whitening.

To calculate these statistics, first of all, one may need to load the entire dataset into the memory. Then calculate the mean, standard deviation, principal components or any other statistics from that data. Fortunately, Keras has a built-in fit method for doing this. Let’s discuss it in detail.

Keras API

Here, x is the data from which to calculate the statistics. Should have rank 4. Note that the channel axis of x should have value either 1, 3, or 4 depending upon whether the data is greyscale, RGB, or RGBA.

This also provides an option of whether to use the augmented data for calculating statistics or not. This is done using the “augment” argument. If True, then augmented examples are also used for calculating statistics. The number of augmented examples depends upon the “rounds” parameter. For instance, if “rounds=2” and x.shape[0] or data size is 64, then 128 augmented examples are used.

Below code shows its implementation (taken from Keras). First of all, create an array of zeros to handle the augmented examples. Then generate the augmented examples using the random_transform method and append to this array.

Here, x is the training data or the data whose statistics we want to calculate. Once we have the data, we can easily calculate the statistics such as mean, standard deviation and principal components using Numpy and Scipy libraries.

Note: Statistics are calculated across all channels in an image. So, don’t calculate the mean separately for each channel.

Now, when we generate batches of augmented data using any method (like flow) these statistics are used to normalize the data as shown below

How to use this?

Let’s take the MNIST digit classification example. Suppose we want to center the distribution i.e. mean equal to 0. For this, we will use the ImageDataGenerator “featurewise_center” transformation. Firstly, load the data and preprocess it.

After loading the data, firstly, create an ImageDataGenerator instance. Then fit the training data as shown below

Let’s calculate the mean of the training data manually and using “datagen” mean attribute.

As expected these should be the same i.e 33.318447. Now, let’s see what happens to the mean of the distribution after normalization.

Clearly, this centers the distribution. Similarly, we can perform other types of normalizations also.

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Keras ImageDataGenerator Normalization at validation and test time

Note: This blog should not be confused with Test time augmentation (TTA).

In the previous blogs, we discussed different operations that are available for image augmentation under the ImageDataGenerator class. For instance rotation, translation, zoom, shearing, normalization, etc. By this, our model will be exposed to more aspects of data and thus will generalize better.

But what about validation and prediction time? Since both of these are used to evaluate the model, we want them to be fixed. That is why we don’t apply any random transformation to the validation and test data. But the test and the dev sets should come from the same distribution as the train set. In other words, the test and the dev sets should be normalized using the statistics calculated on the train set.

Since the normalization in Keras is done using the ImageDataGenerator class. So, in this blog, we will discuss how to normalize the data during prediction using the ImageDataGenerator class?

Method-1

We create a separate ImageDataGenerator instance and then fit it on the train data as shown below.

Similarly, we can do this for the test set. Because for validation and test set we need to fit the generator on the train data, this is very time-consuming.

Method-2

We use the “standardize” method provided under the ImageDataGenerator class. As already discussed, the “standardize” method performs in-place normalization to the batch of inputs, which makes it perfect for this work. You can read more about normalization here.

Method-3

This is similar to the above method but is more explicit. In this we obtain the mean and the standard deviation from the generator and apply the desired normalization.

I hope you might have now get some idea of how to apply normalization during prediction time. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Keras Callbacks – ModelCheckpoint

In this blog, we will discuss how to checkpoint your model in Keras using ModelCheckpoint callbacks. Check-pointing your work is important in any field. If by-chance any problem or failure occurs, you don’t need to restart your work from zero, just resume from that checkpoint. This is very important in the field of deep learning where training can take days. So, let’s see how to use this.

Keras Function

Keras provides a built-in function for model check-pointing as

Let’s discuss in detail each of its arguments:

filepath: This is the path to save your model. Depending on the filepath specified, we can either save only the best model or save models at every epoch. Let’s see what this means.

If you specified the filepath as fixed, for example, ‘D:/best_model.hdf5’, this will overwrite your previous best model and what you end up is the best model up to that epoch.

If you specified a dynamic filepath, say, ‘D:/model{epoch:02d}.hdf5’, this will save the model at every epoch. For instance, for epoch 22, the model will be saved as model22.hdf5. You can only use variables like ‘epoch’ or keys in logs during training such as ‘loss’, ‘acc’, ‘val_loss’ and ‘val_acc’ for formatting the filepath. For example, ‘D:/model-{epoch:02d}-{val_acc:.2f}.hdf5’ is a valid filepath.

monitor: This is the quantity to monitor. This can take one of the values from ‘loss’, ‘acc’, ‘val_loss’ and ‘val_acc’.

verbose: This thing controls whether some information about model saving will be displayed or not. This is either 0 or 1. If 0, nothing will be displayed and for 1 something like this will be displayed depending on the behavior of the monitored quantity.

save_best_only: If set to false, then model after every epoch will be saved whether the monitored quantity increases or decreases. Otherwise, it will save the model depending on the ‘mode’ argument.

mode: This can take one of the values from auto, min, max. For instance, if the mode is ‘max’ and ‘val_acc’ is the monitored quantity, then for save_best_only = True the model will be saved only when ‘val_acc’ improves, otherwise, the model will not be saved at that epoch. For ‘val_loss’, this should be min. ‘auto’ mode automatically decides the direction depending on the monitored quantity.

save_weights_only: if True, then only the model weights will be saved otherwise the full model will be saved.

period: The callback will be applied after the specified period (no. of epochs)

How to use this?

  • All the callbacks are available in the keras.callbacks module so first import the ModelCheckpoint function from this module.
  • Then properly set up the function arguments.
  • Now, to apply this you need to pass this as a list in the .fit() method.

Let’s take MNIST classification example to understand this

Import Libraries

Data Loading and Pre-processing

Build Model

Callbacks

Fit Model

Load Weights and Evaluate test set

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Binary Classification

In this blog, we will learn how to perform binary classification using Convolution Neural Networks. Here, we will be using the classic dogs vs cats dataset, where we have to classify an image as belonging to one of these two classes. So, let’s get started.

Downloading the Dataset

This dataset was made available as a part of the Kaggle competition in 2013. You can download it from here. This dataset contains 25,000 labeled images of dogs and cats in the train folder and 12,500 unlabeled images in the test folder. The size of the images in the dataset is not the same. Some samples from the dataset are shown below

Since the competition is now closed, we can’t submit the test predictions to the kaggle. Thus, to know how good we are doing, we will make the test data from 25,000 labeled images.

Preparing the Data

Here, we will split the train folder into 20,000 for training, and 2500 each for validation and testing. For this, we will create 3 folders corresponding to each train, validation and test set. In these folders, we will create 2 sub-folders as cats and dogs. You can do this manually but here we will be using the Python os module. The code for creating folders and sub-folders is shown below

The above code will create folders and sub-folders in the original path specified above. Now, we will put the images in these folders. The below code places 20,000 images in train, 2500 each in the validation and test folder created above.

Now, let’s display a sample image from say “train_cats_dir”. This is done using the following code.

Data Pre-processing

The data must be processed in an appropriate form before feeding in the neural network. This includes changing the data into numpy arrays, normalizing the values between 0 and 1 or any other suitable range, etc. This can be easily done using the keras ImageDataGenerator class. This is shown in the code below

Here, we will use the flow_from_directory method to generate batches of data.

Build Model

Since this is a binary classification problem, we use the sigmoid activation function in the last layer. The model architecture we will use is shown below.

For the compilation step, we will use the Adam optimizer with the binary crossentropy loss.

Callbacks

To have some control over the training, one must use callbacks. Here, we will be using ModelCheckpoint callback which save the model weights whenever the validation accuracy improves.

Fit Model

Visualize Training

Let’s visualize how the loss and accuracy vary during the training process. This is done using the History() object as shown below

Clearly, our model starts overfitting after 8 epoch. We know that to prevent overfitting, we can

  • Perform Data Augmentation
  • Use Dropout
  • Reduce the capacity of the network
  • Use Regularization etc.

So, let’s use Data Augmentation and Dropout and see how our model performs.

Data Augmentation

In this, we produce more examples from the existing examples by various operations such as rotating, translating, flipping, etc. Fortunately, in Keras, all these transformations can be performed using ImageDataGenerator class. Below is the code for this

Dropout

In this, we randomly turn off some neurons (setting to zero) during training. This can be easily implemented using the Keras Dropout layer. Here, I’ve just added a single Dropout layer before the Dense layer, with a dropout rate of 50%. Rest all the model is same as above.

Just change the train_datagen and add Dropout layer. Then, train the model the same as we did above. Let’s visualize the training process

Clearly, by looking at the plots, we are no longer overfitting. Just by adding a Dropout layer and augmentation, we have increased the accuracy from 87 to 95%. You can further improve the accuracy by using a pre-trained model and other regularization methods.

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.