Tag Archives: Keras flow method

ImageDataGenerator – flow method

In the previous blog, we have discussed how to apply different transformations to augment data using Keras ImageDataGenerator class. In this blog, we will learn how we can generate batches of the augmented data. This is done using the flow method which creates an iterator. We can easily iterate over the iterator to yield the batches of data. Let’s first discuss Keras ImageDataGenerator- flow method API and then we will see how to use this.

Keras API

Here, x is the Numpy array of rank 4 (batches, image_width, image_height, channels) and y is the corresponding labels. For greyscale image, channels must be equal to 1.

One can also save the augmented images to the disk by specifying the “save_to_dir” argument. You can also select which format to save the image files and what prefix to use, using the “save_format” and “save_prefix” arguments respectively.

For instance, the below code saves the augmented file to the downloads folder with the name as “aug_0_2345” etc.

Another interesting thing is that one can weight each sample using the “sample_weight” argument. Now, while calculating the loss each sample has its own weight which controls the gradient direction. This should have the same length as the input array. These sample_weights, if not None, are returned as it is.

subset” decides whether the data generated is for training or validation. This works as follows:

First of all, depending on the input length and validation_split argument in the ImageDataGenerator, the split index is determined as shown

Now, if subset is ‘validation’, then the data is splitted as

Rest of the data is reserved for the training. As we can see that splitting is straight i.e. it reserves first n examples for validation and rest for training. So, training and validation may have a different number of classes after the split, if the data is not properly shuffled.

Note for the test set, set shuffle equal to False. Set the batch size carefully for the test set. Make sure that this divides exactly the test set as you don’t want to leave some examples or predict multiple times some examples.

Now, you might have got some idea about the flow method arguments. Next, let’s see how this method works.

How the flow method works?

  • Firstly, this generates random parameters for a transformation using the “get_random_transform” method.
  • Then these transformations are applied using the “apply_transform” method.
  • Finally, the image is standardized using the “standardize” method.

How to use?

Let’s take MNIST digits classification example. Firstly load the required libraries and the data.

1. Load Libraries and Data

2. Build model

3. Data Augmentation

Create an ImageDataGenerator instance with the set of transformations you want to perform. If you were to perform augmentation using transformation such as rotation, cropping, etc. better create a separate generator for the validation set. Because validation data should be kept fixed. In that case, don’t use the validation_split argument. Instead, use some other methods for splitting, for instance, train_test_split, etc.

4. flow method

Based on the validation split argument in the above code, we create a separate training and validation generator using the “subset” argument.

5. Visualize the training generator

Let’s plot the first outcome of 6 batches.

6. Train model

Similarly, you can create the test generator and evaluate the performance of the model on the test set. This is how you can use the flow method. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.