Tag Archives: ImageDataGenerator

Multi-Label Classification

In the previous blogs, we discussed binary and multi-class classification problems. Both of these are almost similar. The basic assumption underlying these two problems is that each image can contain only one class. For instance, for the dogs vs cats classification, it was assumed that the image can contain either cat or dog but not both. So, in this blog, we will discuss the case where more than one classes can be present in a single image. This type of classification is known as Multi-label classification. Below picture explains this concept beautifully.

Source: cse-iitk

Some of the most common techniques for solving multi-label classification problems are

  • Problem Transformation
  • Adapted Algorithm
  • Ensemble approaches

Here, we will only discuss only Binary Relevance, a method that falls under the Problem Transformation category. If you are curious about other methods, you can read this amazing review paper.

In binary relevance, we try to break the problem into a number of binary classification problems. So, now for each class available, we will ask if it is present in the image or not. As we already know that the binary classification uses ‘sigmoid‘ as the last layer activation function and ‘binary_crossentropy‘ as the loss function. So, here we will also use the same. Rest all things are the same.

Now, let’s take a dataset and see how to implement multi-label classification.

Problem Definition

Here, we will take the most common Movie Genre classification based on the poster images problem. Because a movie can belong to more than one genre, for instance, comedy, romance, etc. and hence is a multi-label classification problem.

Dataset

You can download the original dataset from here. This contains two files.

  • Movie_Poster_Dataset.zip – The poster images
  • Movie_Poster_Metadata.zip – Metadata of each poster image like ID, genres, box office, etc.

To prepare the dataset, we need images and corresponding genre information. For this, we need to extract the genre information from the Movie_Poster_Metadata.zip file corresponding to each poster image. Let’s see how to do this.

Note: This dataset contains some missing items. For instance, check the “1982” folder in the Movie_Poster_Dataset.zip and Movie_Poster_Metadata.zip. The number of poster images and the corresponding genre information is missing for some movies. So, we need to perform EDA and remove these files.

Steps to perform EDA:

  1. First, we will extract the movie name and corresponding genre information from the Movie_Poster_Metadata.zip file and create a Pandas dataframe using these.
  2. Then we will loop over the poster images in the Movie_Poster_Dataset.zip file and check if it is present in the dataframe created above. If the poster is not present, we will remove that movie from the dataframe.

These two steps will ensure that we are only left with movies that have poster images and genre information. Below is the code for this.

Because the encoding of some files is different, that’s why 2 for loops. Below are the steps performed in the code.

  • First, open the metadata file
  • Read line by line
  • Extract the information corresponding to the ‘Genre’ and ‘imdbID’
  • Append them into the list and create a dataframe

Now for the second step, we first append all the poster images filenames in the list.

Then check if the name is present in the dataframe or not. If not, we will remove the rows from the dataframe or create a new dataframe.

Be sure that we have no duplicates in the dataframe.

So, finally, we are ready with our cleaned dataset with 8052 images containing overall 25 classes. The dataframe is shown below.

Format 1

One can also convert this dataframe into the common format as shown below

Format 2

This can be done using the following code.

In this post, we will be using Format 1. You can use any. Here, we will be using the Keras flow_from_dataframe method. For this, we need to place all the images under one directory. Currently, all the images are in separate folders such as 1980, 1981, etc. Below is the code that places all the poster images in a single folder ‘original_train‘.

Model Architecture

Since this is a sparse multilabel classification problem, accuracy is not a good metric for this. The reason for this is shown below.

if the predicted output was [0, 0, 0, 0, 0, 1] and the correct output was [0, 0, 0, 0, 0, 0], my accuracy would still be 5/6.

So, you can use other metrics like precision, recall, f1 score, hamming loss, top_k_categorical_accuracy, etc.

Here, I’ve used both to show how accuracy instantly reaches 90+ from the starting epoch and thus is not a correct metric.

flow_from_dataframe()

Here, I split the data into training and validation sets using the validation_split argument of ImageDataGenerator. You can read more about the ImageDataGenerator here.

Below are some of the poster images all resized into (400,300,3).

You can also check which labels are assigned to which class using the following code.

This prints a dictionary containing class names as keys and labels as values.

Let’s start training…

See how accuracy is reaching 90+ within few epochs. As stated earlier this is not a good evaluation metric for multi-label classification. On the other hand, top_k_categorical_accuracy is showing us the true picture.

Clearly, we are doing a pretty decent job. Considering the fact that training data is small and the complexity of the problem is large(25 classes). Moreover, some classes like comedy, etc dominate the training data. Play with the model architecture and other hyperparameters and check how the accuracy varies.

Prediction time

For each image, let’s predict the top three predicted classes. Below is the code for this.

The actual label for this can be found out as

You can see that our model is doing a decent job considering the complexity of the problem

Let’s try another example “tt0465602.jpg“. For this the predicted labels are

By looking at the poster most of us will predict the labels as predicted by our algorithm. Actually, these are pretty close to the true labels that are [Action, Comedy, Crime].

That’s all for multi-label classification problem. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

ImageDataGenerator – apply_transform method

In this blog, we will discuss ImageDataGenerator “apply_transform” method. Using this method, you can apply any desired transformations to an image. You can find its use in the ImageDataGenerator “flow” method. First of all, let’s discuss its Keras API.

Keras API

This applies transformations to x (3D tensor) according to the transform parameters specified.

The “transform_parameters” is a dictionary specifying the set of transformations to be applied. Only the following transformations are available

Let’s discuss these in detail.

theta: Rotation angle in degrees. Below is an example that rotates the image by 40 degrees.

tx and ty: These are the shifts in the vertical and the horizontal directions respectively. For instance, tx=20 will shift the image vertically by 20 pixels.

In this, first of all, the translation matrix is calculated. Then affine transformation is applied using the “scipy.ndimage” affine_transformation method.

ty = 20

zx and zy: This zooms the image in the vertical and horizontal directions respectively. If less than 1, the image is zoomed in otherwise zoomed out.

Note: -ve values of zx and zy results in flipping the image in vertical and horizontal directions respectively. For instance, zx=-1 will flip the image vertically.

flip_horizontal and flip_vertical: This flips the image horizontally and vertically. For instance, below is the code for flipping the image horizontally,

channel_shift_intensity: This shifts the channel values by the amount specified. The following code sums up how it works

brightness: This controls the brightness of the image. An enhancement factor of 0.0 gives a black image. A factor of 1.0 gives the original image.

Hope you understand all the arguments. Now, let’s see how to use this.

How to use this?

Because “apply_transform” is a method inside the ImageDataGenerator class. Thus, one first need to create the instance of this class and then apply this method as shown below

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

ImageDataGenerator – get_random_transform method

In the previous blog, we discussed how to generate batches of augmented data using the flow method. We also learned that the key ingredient in the flow method is the “get_random_transform” method. This generates random parameters for a transformation. So, in this blog, let’s discuss this method in detail.

Keras API

Now, let’s see how this generates random parameters for transformations by just using the image shape information.

How this works?

This borrows the parameters from the ImageDataGenerator class. For instance, if we define rotation range in the ImageDataGenerator class

then the random parameters for this is obtained as

Thus, whenever you generate examples, theta is obtained from the uniform distribution, specified according to the parameters provided in the ImageDataGenerator class.

Similarly, for every transformation provided in the ImageDataGenerator class, we can obtain the random parameters. For more details, refer to the Keras GitHub.

How to use this?

To use this, you first need to provide the transformations in the ImageDataGenerator class. For instance, if I just want to rotate the image, then first specify the parameters as

Now, to get the random parameters, call the “get_random_transform” method as

This outputs the following parameters dictionary as

See only the transformations specified in the ImageDataGenerator class i.e. theta value is changed. Rest all values are the default.

So, this way one can generate random parameters for transformations. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

ImageDataGenerator – standardize method

In this blog, we will discuss ImageDataGenerator “standardize” method. This method performs in-place normalization to the batch of inputs. As already discussed, this is an important step in the flow method or data augmentation. So, let’s discuss it in detail.

Keras API

Here, x is the batch of inputs. This method returns the normalized inputs. Note that x is changed in-place. If you don’t want to change the inputs in-place, pass a copy of the input to this method.

How this works?

While performing data augmentation with ImageDataGenerator, we discussed different normalization techniques. These techniques include centering the entire distribution or a sample, rescaling the input, performing zca whitening, etc. Behind the scenes, these are implemented by the “standardize” method. Let’s see how.

For instance, we want to rescale the input by 1/255. So, first of all we will create an ImageDataGenerator instance as shown below

Then for data augmentation, we will use the flow method as

Thus, the training_generator will yield batches of augmented images. That’s all we usually do.

As already discussed in this blog, the flow method consists of three steps, of which the last step is the “standardize” method. All the normalization work in the ImageDataGenerator class is handled by this method.

Now, coming back to the above example, the “standardize” method will first check whether you want to rescale or not. If yes, then this will change the input in-place as shown below.

Similarly, this method performs featurewise_center or samplewise_center or any other normalization. For more details, refer to Keras Github.

How to use this?

First of all, create an ImageDataGenerator instance with the desired transformations. Then apply the “standardize” method as shown below.

Note: The standardize method only supports transformations that perform normalization such as featurewise_center, rescale, etc. Otherwise, this returns the same image or batch of inputs.

What does in-place means?

As already discussed, this method normalizes the inputs in-place. This is exactly what in-place operators in Python do. Let’s take an example to understand what does in-place means.

For instance, let’s rescale an image of all ones by 2. After the “standardize” method, see how the mean of the “images” change.

That’s all for “standardize” method. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

ImageDataGenerator – random_transform method

In the previous blog, we discussed how to generate random parameters for a transformation. In this blog, we will discuss how to apply a random transformation to an image.

Keras API

This function returns a randomly transformed version of the input image x.

How this method works?

  • First of all, this generates random parameters for a transformation using the “get_random_transform” method. For more details, refer to this blog.
  • Then the image is transformed according to the parameters (generated above) using the “apply_transform” method. For more details, refer to this blog.

Below is the code for this (taken from Keras)

How to use this?

To use this, you first need to provide the desired transformations in the ImageDataGenerator class. For instance, let’s say we just want to zoom the image. Firstly, we specify the parameters in the ImageDataGenerator class.

Then we apply the random_transform method.

Similarly, you can apply any random transformation to the image. Just specify the transformations in the ImageDataGenerator class. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

ImageDataGenerator – fit method

In the previous blog, we discussed how to perform data augmentation using ImageDataGenerator. In that, we saw that some transformations require statistics of the entire dataset. These transformations include featurewise_center, featurewise_std_normalization and zca_whitening.

To calculate these statistics, first of all, one may need to load the entire dataset into the memory. Then calculate the mean, standard deviation, principal components or any other statistics from that data. Fortunately, Keras has a built-in fit method for doing this. Let’s discuss it in detail.

Keras API

Here, x is the data from which to calculate the statistics. Should have rank 4. Note that the channel axis of x should have value either 1, 3, or 4 depending upon whether the data is greyscale, RGB, or RGBA.

This also provides an option of whether to use the augmented data for calculating statistics or not. This is done using the “augment” argument. If True, then augmented examples are also used for calculating statistics. The number of augmented examples depends upon the “rounds” parameter. For instance, if “rounds=2” and x.shape[0] or data size is 64, then 128 augmented examples are used.

Below code shows its implementation (taken from Keras). First of all, create an array of zeros to handle the augmented examples. Then generate the augmented examples using the random_transform method and append to this array.

Here, x is the training data or the data whose statistics we want to calculate. Once we have the data, we can easily calculate the statistics such as mean, standard deviation and principal components using Numpy and Scipy libraries.

Note: Statistics are calculated across all channels in an image. So, don’t calculate the mean separately for each channel.

Now, when we generate batches of augmented data using any method (like flow) these statistics are used to normalize the data as shown below

How to use this?

Let’s take the MNIST digit classification example. Suppose we want to center the distribution i.e. mean equal to 0. For this, we will use the ImageDataGenerator “featurewise_center” transformation. Firstly, load the data and preprocess it.

After loading the data, firstly, create an ImageDataGenerator instance. Then fit the training data as shown below

Let’s calculate the mean of the training data manually and using “datagen” mean attribute.

As expected these should be the same i.e 33.318447. Now, let’s see what happens to the mean of the distribution after normalization.

Clearly, this centers the distribution. Similarly, we can perform other types of normalizations also.

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Keras ImageDataGenerator Normalization at validation and test time

Note: This blog should not be confused with Test time augmentation (TTA).

In the previous blogs, we discussed different operations that are available for image augmentation under the ImageDataGenerator class. For instance rotation, translation, zoom, shearing, normalization, etc. By this, our model will be exposed to more aspects of data and thus will generalize better.

But what about validation and prediction time? Since both of these are used to evaluate the model, we want them to be fixed. That is why we don’t apply any random transformation to the validation and test data. But the test and the dev sets should come from the same distribution as the train set. In other words, the test and the dev sets should be normalized using the statistics calculated on the train set.

Since the normalization in Keras is done using the ImageDataGenerator class. So, in this blog, we will discuss how to normalize the data during prediction using the ImageDataGenerator class?

Method-1

We create a separate ImageDataGenerator instance and then fit it on the train data as shown below.

Similarly, we can do this for the test set. Because for validation and test set we need to fit the generator on the train data, this is very time-consuming.

Method-2

We use the “standardize” method provided under the ImageDataGenerator class. As already discussed, the “standardize” method performs in-place normalization to the batch of inputs, which makes it perfect for this work. You can read more about normalization here.

Method-3

This is similar to the above method but is more explicit. In this we obtain the mean and the standard deviation from the generator and apply the desired normalization.

I hope you might have now get some idea of how to apply normalization during prediction time. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Binary Classification

In this blog, we will learn how to perform binary classification using Convolution Neural Networks. Here, we will be using the classic dogs vs cats dataset, where we have to classify an image as belonging to one of these two classes. So, let’s get started.

Downloading the Dataset

This dataset was made available as a part of the Kaggle competition in 2013. You can download it from here. This dataset contains 25,000 labeled images of dogs and cats in the train folder and 12,500 unlabeled images in the test folder. The size of the images in the dataset is not the same. Some samples from the dataset are shown below

Since the competition is now closed, we can’t submit the test predictions to the kaggle. Thus, to know how good we are doing, we will make the test data from 25,000 labeled images.

Preparing the Data

Here, we will split the train folder into 20,000 for training, and 2500 each for validation and testing. For this, we will create 3 folders corresponding to each train, validation and test set. In these folders, we will create 2 sub-folders as cats and dogs. You can do this manually but here we will be using the Python os module. The code for creating folders and sub-folders is shown below

The above code will create folders and sub-folders in the original path specified above. Now, we will put the images in these folders. The below code places 20,000 images in train, 2500 each in the validation and test folder created above.

Now, let’s display a sample image from say “train_cats_dir”. This is done using the following code.

Data Pre-processing

The data must be processed in an appropriate form before feeding in the neural network. This includes changing the data into numpy arrays, normalizing the values between 0 and 1 or any other suitable range, etc. This can be easily done using the keras ImageDataGenerator class. This is shown in the code below

Here, we will use the flow_from_directory method to generate batches of data.

Build Model

Since this is a binary classification problem, we use the sigmoid activation function in the last layer. The model architecture we will use is shown below.

For the compilation step, we will use the Adam optimizer with the binary crossentropy loss.

Callbacks

To have some control over the training, one must use callbacks. Here, we will be using ModelCheckpoint callback which save the model weights whenever the validation accuracy improves.

Fit Model

Visualize Training

Let’s visualize how the loss and accuracy vary during the training process. This is done using the History() object as shown below

Clearly, our model starts overfitting after 8 epoch. We know that to prevent overfitting, we can

  • Perform Data Augmentation
  • Use Dropout
  • Reduce the capacity of the network
  • Use Regularization etc.

So, let’s use Data Augmentation and Dropout and see how our model performs.

Data Augmentation

In this, we produce more examples from the existing examples by various operations such as rotating, translating, flipping, etc. Fortunately, in Keras, all these transformations can be performed using ImageDataGenerator class. Below is the code for this

Dropout

In this, we randomly turn off some neurons (setting to zero) during training. This can be easily implemented using the Keras Dropout layer. Here, I’ve just added a single Dropout layer before the Dense layer, with a dropout rate of 50%. Rest all the model is same as above.

Just change the train_datagen and add Dropout layer. Then, train the model the same as we did above. Let’s visualize the training process

Clearly, by looking at the plots, we are no longer overfitting. Just by adding a Dropout layer and augmentation, we have increased the accuracy from 87 to 95%. You can further improve the accuracy by using a pre-trained model and other regularization methods.

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

ImageDataGenerator – flow_from_directory method

In the previous blog, we learned how to generate batches of augmented data using the flow method. In that, the data was loaded in the memory. But this is not always the case. Sometimes, the datasets we download contains folders of data corresponding to the respective classes. To use the flow method, one may first need to append the data and corresponding labels into an array and then use the flow method on those arrays. Thus overall it is a tedious task.

This led to the need for a method that takes the path to a directory and generates batches of augmented data. In Keras, this is done using the flow_from_directory method. So, let’s discuss this method in detail.

Keras API

Here, the directory is the path of the directory that contains the sub-directories of the respective classes. Each subdirectory is treated as a different class. The name of the class can either be inferred from the subdirectory name or can be passed using the “classes” argument. The labels to these classes are assigned alphanumerically.

For instance, suppose you have a directory structure as shown below

So, in this case, the directory will be the path to the train folder. If we set the “classes=None“, the class names will be inferred from the sub-directory names as “dogs” and “cats”. Because the labels are assigned alphanumerically, the labels for this will be {‘cats’: 0, ‘dogs’: 1}. If we have passed the argument classes=[‘Dog’,’Cat’], then the labels will be {‘Cat’: 0, ‘Dog’: 1}.

To check the class labels, we can use the “class_indices” argument as

This returns a dictionary containing the mapping from class names to class indices. 

The labels generated depends on the “class_mode” argument. This can take one of “categorical“, “binary“, “sparse“, “input“, or None. Default is “categorical”. 

  • If “binary“, the labels are “0” and “1”.
  • For “categorical“, we will have 2D one-hot encoded labels
  • If “sparse”, 1D integer labels
  • For autoencoders, pass this as “input
  • Since during test time we have no labels, so pass as None.

Sometimes the datasets contain images that are not of the same size. So, using the “target_size” argument, we can resize the images to a fixed size using an interpolation method specified by the “interpolation” argument. Default is the nearest neighbor interpolation method.

You can also convert the color of the images using the “color_mode” argument. Available options are “grayscale“, “rgb“, “rgba“. Default is “rgb“.

You can also save the augmented images to the disk by specifying the “save_to_dir” argument. You can also select which format to save the image files and what prefix to use, using the “save_format” and “save_prefix” arguments respectively.

To see an example of flow_from_directory() method, you can refer to this blog.

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

ImageDataGenerator – flow_from_dataframe method

In the previous blogs, we discussed flow and flow_from_directory methods. Both these methods perform the same task i.e. generate batches of augmented data. The only thing that differs is the format or structuring of the datasets. Some of the most common formats (Image datasets) are

  • Keras builtin datasets
  • Datasets containing separate folders of data corresponding to the respective classes.
  • Datasets containing a single folder along with a CSV or JSON file that maps the image filenames with their corresponding classes.

We already know how to deal with the first two formats. In this blog, we will discuss how to perform data augmentation with the data available in the data frame. To do this, Keras provides a builtin flow_from_dataframe method. So, let’s discuss this method in detail.

Keras API

In this, you need to provide the data frame that contains the image names or file paths and the corresponding labels. Now, there are two cases possible:

  • if the data frame contains image names then you need to specify the directory where these images are residing, using the “directory” argument. See the example below.
  • if the data frame contains the absolute image paths then set the “directory” argument to None.

Similarly, for the labels column, the values can be string/list/tuple depending on the “class_mode” argument. For instance, if class_mode is binary, then the label column must contain the class values as strings. Note that we can have multiple label columns also. For instance regression tasks like bounding box prediction etc. Then you need to pass these columns as a list in the “y_col” argument.

Rest all the arguments are the same as discussed in the ImageDataGenerator flow_from_directory blog. Now let’s take an example to see how to use this.

We will take the traditional cats vs dogs dataset. First, download the dataset from Kaggle. This dataset contains two folders train and the test each containing 25000 and 12500 images respectively.

Create a Dataframe

The first step is to create a data frame that contains the filename and the corresponding labels column. For this, we will iterate over each image in the train folder and check the filename prefix. If it is a cat, set the label to 0 otherwise 1.

Now create a data frame as

Create Generators

Now, we will create the train and validation generator using the flow_from_dataframe method as

Build the Model

Train the Model

Let’s train the model using the fit_generator method.

Test time

So, for the test time, we can simply use the flow_from_directory method. You can use any method. For this, you need to create a subfolder inside the test folder. Remember not to shuffle the data at the test time. The class_mode argument should be set to None.

For predictions, we can simply use the predict_generator method.

That’s all for the flow_from_dataframe method. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.