ImageDataGenerator – flow method

In the previous blog, we have discussed how to apply different transformations to augment data using Keras ImageDataGenerator class. In this blog, we will learn how we can generate batches of the augmented data. This is done using the flow method which creates an iterator. We can easily iterate over the iterator to yield the batches of data. Let’s first discuss Keras ImageDataGenerator- flow method API and then we will see how to use this.

Keras API

flow(x, y=None, batch_size=32, shuffle=True, sample_weight=None, seed=None, save_to_dir=None, save_prefix='', save_format='png', subset=None)

1	flow(x, y=None, batch_size=32, shuffle=True, sample_weight=None, seed=None, save_to_dir=None, save_prefix='', save_format='png', subset=None)

Here, x is the Numpy array of rank 4 (batches, image_width, image_height, channels) and y is the corresponding labels. For greyscale image, channels must be equal to 1.

One can also save the augmented images to the disk by specifying the “save_to_dir” argument. You can also select which format to save the image files and what prefix to use, using the “save_format” and “save_prefix” arguments respectively.

For instance, the below code saves the augmented file to the downloads folder with the name as “aug_0_2345” etc.

data_generator = datagen.flow(img, save_to_dir='D:/downloads/', save_format='jpeg', save_prefix='aug')

1	data_generator = datagen.flow(img, save_to_dir='D:/downloads/', save_format='jpeg', save_prefix='aug')

Another interesting thing is that one can weight each sample using the “sample_weight” argument. Now, while calculating the loss each sample has its own weight which controls the gradient direction. This should have the same length as the input array. These sample_weights, if not None, are returned as it is.

“subset” decides whether the data generated is for training or validation. This works as follows:

First of all, depending on the input length and validation_split argument in the ImageDataGenerator, the split index is determined as shown

split_idx = int(len(x) * image_data_generator._validation_split)

1	split_idx = int(len(x) * image_data_generator._validation_split)

Now, if subset is ‘validation’, then the data is splitted as

x = x[:split_idx]

1	x = x[:split_idx]

Rest of the data is reserved for the training. As we can see that splitting is straight i.e. it reserves first n examples for validation and rest for training. So, training and validation may have a different number of classes after the split, if the data is not properly shuffled.

Note for the test set, set shuffle equal to False. Set the batch size carefully for the test set. Make sure that this divides exactly the test set as you don’t want to leave some examples or predict multiple times some examples.

Now, you might have got some idea about the flow method arguments. Next, let’s see how this method works.

How the flow method works?

Firstly, this generates random parameters for a transformation using the “get_random_transform” method.
Then these transformations are applied using the “apply_transform” method.
Finally, the image is standardized using the “standardize” method.

How to use?

Let’s take MNIST digits classification example. Firstly load the required libraries and the data.

1. Load Libraries and Data

from keras.layers import Dense, Flatten, Conv2D, MaxPool2D
from keras.models import Sequential
from keras.datasets import mnist
import numpy as np
import matplotlib.pyplot as plt

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = np.expand_dims(x_train, axis=-1)

from keras.layers import Dense, Flatten, Conv2D, MaxPool2D

from keras.models import Sequential

from keras.datasets import mnist

import numpy as np

import matplotlib.pyplot as plt

(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train = np.expand_dims(x_train, axis=-1)

2. Build model

model = Sequential()
model.add(Conv2D(32,(3,3),activation='relu',input_shape=(28,28,1)))
model.add(MaxPool2D((2,2)))
model.add(Conv2D(64,(3,3),activation='relu'))
model.add(MaxPool2D((2,2)))
model.add(Flatten())
model.add(Dense(512,activation='relu'))
model.add(Dense(10,activation='softmax'))

model.compile(loss='sparse_categorical_crossentropy', optimizer='Adam', metrics=['accuracy'])

model = Sequential()

model.add(Conv2D(32,(3,3),activation='relu',input_shape=(28,28,1)))

model.add(MaxPool2D((2,2)))

model.add(Conv2D(64,(3,3),activation='relu'))

model.add(MaxPool2D((2,2)))

model.add(Flatten())

model.add(Dense(512,activation='relu'))

model.add(Dense(10,activation='softmax'))

model.compile(loss='sparse_categorical_crossentropy', optimizer='Adam', metrics=['accuracy'])

3. Data Augmentation

Create an ImageDataGenerator instance with the set of transformations you want to perform. If you were to perform augmentation using transformation such as rotation, cropping, etc. better create a separate generator for the validation set. Because validation data should be kept fixed. In that case, don’t use the validation_split argument. Instead, use some other methods for splitting, for instance, train_test_split, etc.

datagen = ImageDataGenerator(rescale=1/255.,validation_split=0.2)

1	datagen = ImageDataGenerator(rescale=1/255.,validation_split=0.2)

4. flow method

Based on the validation split argument in the above code, we create a separate training and validation generator using the “subset” argument.

training_generator = datagen.flow(x_train, y_train, batch_size=64,subset='training',seed=7)
validation_generator = datagen.flow(x_train, y_train, batch_size=64,subset='validation',seed=7)

1 2	training_generator = datagen.flow(x_train, y_train, batch_size=64,subset='training',seed=7) validation_generator = datagen.flow(x_train, y_train, batch_size=64,subset='validation',seed=7)

5. Visualize the training generator

Let’s plot the first outcome of 6 batches.

plt.figure(figsize=(10,5))
for i in range(6):
    plt.subplot(2,3,i+1)
    for x,y in training_generator:
        plt.imshow((x[0]/255).reshape(28,28),cmap='gray')
        plt.title('y={}'.format(y[0]))
        plt.axis('off')
        break
plt.tight_layout()
plt.show()

plt.figure(figsize=(10,5))

for i in range(6):

plt.subplot(2,3,i+1)

for x,y in training_generator:

plt.imshow((x[0]/255).reshape(28,28),cmap='gray')

plt.title('y={}'.format(y[0]))

plt.axis('off')

break

plt.tight_layout()

plt.show()

6. Train model

history = model.fit_generator(training_generator,steps_per_epoch=(len(x_train)*0.8)//64, epochs=10, validation_data=validation_generator, validation_steps=(len(x_train)*0.2)//64)

1	history = model.fit_generator(training_generator,steps_per_epoch=(len(x_train)0.8)//64, epochs=10, validation_data=validation_generator, validation_steps=(len(x_train)0.2)//64)

Similarly, you can create the test generator and evaluate the performance of the model on the test set. This is how you can use the flow method. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

0 Shares

TheAILearner

Mastering Artificial Intelligence