Tag Archives: keras

ImageDataGenerator – flow_from_directory method

In the previous blog, we learned how to generate batches of augmented data using the flow method. In that, the data was loaded in the memory. But this is not always the case. Sometimes, the datasets we download contains folders of data corresponding to the respective classes. To use the flow method, one may first need to append the data and corresponding labels into an array and then use the flow method on those arrays. Thus overall it is a tedious task.

This led to the need for a method that takes the path to a directory and generates batches of augmented data. In Keras, this is done using the flow_from_directory method. So, let’s discuss this method in detail.

Keras API

Here, the directory is the path of the directory that contains the sub-directories of the respective classes. Each subdirectory is treated as a different class. The name of the class can either be inferred from the subdirectory name or can be passed using the “classes” argument. The labels to these classes are assigned alphanumerically.

For instance, suppose you have a directory structure as shown below

So, in this case, the directory will be the path to the train folder. If we set the “classes=None“, the class names will be inferred from the sub-directory names as “dogs” and “cats”. Because the labels are assigned alphanumerically, the labels for this will be {‘cats’: 0, ‘dogs’: 1}. If we have passed the argument classes=[‘Dog’,’Cat’], then the labels will be {‘Cat’: 0, ‘Dog’: 1}.

To check the class labels, we can use the “class_indices” argument as

This returns a dictionary containing the mapping from class names to class indices. 

The labels generated depends on the “class_mode” argument. This can take one of “categorical“, “binary“, “sparse“, “input“, or None. Default is “categorical”. 

  • If “binary“, the labels are “0” and “1”.
  • For “categorical“, we will have 2D one-hot encoded labels
  • If “sparse”, 1D integer labels
  • For autoencoders, pass this as “input
  • Since during test time we have no labels, so pass as None.

Sometimes the datasets contain images that are not of the same size. So, using the “target_size” argument, we can resize the images to a fixed size using an interpolation method specified by the “interpolation” argument. Default is the nearest neighbor interpolation method.

You can also convert the color of the images using the “color_mode” argument. Available options are “grayscale“, “rgb“, “rgba“. Default is “rgb“.

You can also save the augmented images to the disk by specifying the “save_to_dir” argument. You can also select which format to save the image files and what prefix to use, using the “save_format” and “save_prefix” arguments respectively.

To see an example of flow_from_directory() method, you can refer to this blog.

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

ImageDataGenerator – flow_from_dataframe method

In the previous blogs, we discussed flow and flow_from_directory methods. Both these methods perform the same task i.e. generate batches of augmented data. The only thing that differs is the format or structuring of the datasets. Some of the most common formats (Image datasets) are

  • Keras builtin datasets
  • Datasets containing separate folders of data corresponding to the respective classes.
  • Datasets containing a single folder along with a CSV or JSON file that maps the image filenames with their corresponding classes.

We already know how to deal with the first two formats. In this blog, we will discuss how to perform data augmentation with the data available in the data frame. To do this, Keras provides a builtin flow_from_dataframe method. So, let’s discuss this method in detail.

Keras API

In this, you need to provide the data frame that contains the image names or file paths and the corresponding labels. Now, there are two cases possible:

  • if the data frame contains image names then you need to specify the directory where these images are residing, using the “directory” argument. See the example below.
  • if the data frame contains the absolute image paths then set the “directory” argument to None.

Similarly, for the labels column, the values can be string/list/tuple depending on the “class_mode” argument. For instance, if class_mode is binary, then the label column must contain the class values as strings. Note that we can have multiple label columns also. For instance regression tasks like bounding box prediction etc. Then you need to pass these columns as a list in the “y_col” argument.

Rest all the arguments are the same as discussed in the ImageDataGenerator flow_from_directory blog. Now let’s take an example to see how to use this.

We will take the traditional cats vs dogs dataset. First, download the dataset from Kaggle. This dataset contains two folders train and the test each containing 25000 and 12500 images respectively.

Create a Dataframe

The first step is to create a data frame that contains the filename and the corresponding labels column. For this, we will iterate over each image in the train folder and check the filename prefix. If it is a cat, set the label to 0 otherwise 1.

Now create a data frame as

Create Generators

Now, we will create the train and validation generator using the flow_from_dataframe method as

Build the Model

Train the Model

Let’s train the model using the fit_generator method.

Test time

So, for the test time, we can simply use the flow_from_directory method. You can use any method. For this, you need to create a subfolder inside the test folder. Remember not to shuffle the data at the test time. The class_mode argument should be set to None.

For predictions, we can simply use the predict_generator method.

That’s all for the flow_from_dataframe method. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

ImageDataGenerator – flow method

In the previous blog, we have discussed how to apply different transformations to augment data using Keras ImageDataGenerator class. In this blog, we will learn how we can generate batches of the augmented data. This is done using the flow method which creates an iterator. We can easily iterate over the iterator to yield the batches of data. Let’s first discuss Keras ImageDataGenerator- flow method API and then we will see how to use this.

Keras API

Here, x is the Numpy array of rank 4 (batches, image_width, image_height, channels) and y is the corresponding labels. For greyscale image, channels must be equal to 1.

One can also save the augmented images to the disk by specifying the “save_to_dir” argument. You can also select which format to save the image files and what prefix to use, using the “save_format” and “save_prefix” arguments respectively.

For instance, the below code saves the augmented file to the downloads folder with the name as “aug_0_2345” etc.

Another interesting thing is that one can weight each sample using the “sample_weight” argument. Now, while calculating the loss each sample has its own weight which controls the gradient direction. This should have the same length as the input array. These sample_weights, if not None, are returned as it is.

subset” decides whether the data generated is for training or validation. This works as follows:

First of all, depending on the input length and validation_split argument in the ImageDataGenerator, the split index is determined as shown

Now, if subset is ‘validation’, then the data is splitted as

Rest of the data is reserved for the training. As we can see that splitting is straight i.e. it reserves first n examples for validation and rest for training. So, training and validation may have a different number of classes after the split, if the data is not properly shuffled.

Note for the test set, set shuffle equal to False. Set the batch size carefully for the test set. Make sure that this divides exactly the test set as you don’t want to leave some examples or predict multiple times some examples.

Now, you might have got some idea about the flow method arguments. Next, let’s see how this method works.

How the flow method works?

  • Firstly, this generates random parameters for a transformation using the “get_random_transform” method.
  • Then these transformations are applied using the “apply_transform” method.
  • Finally, the image is standardized using the “standardize” method.

How to use?

Let’s take MNIST digits classification example. Firstly load the required libraries and the data.

1. Load Libraries and Data

2. Build model

3. Data Augmentation

Create an ImageDataGenerator instance with the set of transformations you want to perform. If you were to perform augmentation using transformation such as rotation, cropping, etc. better create a separate generator for the validation set. Because validation data should be kept fixed. In that case, don’t use the validation_split argument. Instead, use some other methods for splitting, for instance, train_test_split, etc.

4. flow method

Based on the validation split argument in the above code, we create a separate training and validation generator using the “subset” argument.

5. Visualize the training generator

Let’s plot the first outcome of 6 batches.

6. Train model

Similarly, you can create the test generator and evaluate the performance of the model on the test set. This is how you can use the flow method. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Data Augmentation with Keras ImageDataGenerator

One of the methods to prevent overfitting is to have more data. By this, our model will be exposed to more aspects of data and thus will generalize better. To get more data, either you manually collect data or generate data from the existing data by applying some transformations. The latter method is known as Data Augmentation.

In this blog, we will learn how we can perform data augmentation using Keras ImageDataGenerator class. First, we will discuss keras image augmentation API and then we will learn how to use this.

Keras API

Let’s understand each of its arguments in detail using the following image

featurewise_center: Feature-wise means of the entire dataset. So, in this, we first calculate the mean over the entire dataset and then subtract this mean from each image. So, this results in shifting the mean of the distribution close to zero. To calculate the mean, you need to fit the data generator to the training data as

For this, you have to load the entire training dataset which may significantly kill your memory if the dataset is large. To prevent this, one can calculate the mean from a smaller sample.

featurewise_std_normalization: In this, we divide each image by the standard deviation of the entire dataset. Thus, featurewise center and std_normalization together known as standardization tends to make the mean of the data to be 0 and std. deviation of 1 or in short Gaussian Distribution.

samplewise_center: Sample-wise means of a single image. So, in this, we set the mean pixel value of each image to be zero. Since the image mean is a local statistic that can be calculated from the image itself, there is no need for calling the fit method.

samplewise_std_normalization: In this, we divide each input image by its standard deviation.

zca_whitening: This is a preprocessing method which tries to remove the redundancy from the data while keeping its structure intact, unlike PCA. In short, this strengthens the high-frequency components in the image. For maths behind this, refer to this StackOverflow question. You need to fit the training data to calculate the principal components. This should be used with featurewise_center=True, otherwise, this will give you a warning and automatically set featurewise_center=True.

Note: For featurewise_center, featurewise_std_normalization, zca_whitening, one must fit the data to calculate the mean, standard deviation, and principal components.

rotation_range: This rotates each image up to the angle specified. Below figure shows the rotations by 45 degrees

width_shift_range: This results in shifting the image in the horizontal direction.

  • If it is a float less than 1, then this shifts the image by that fraction of width. For instance, 0.2 means shift horizontally by 20% of the image width.
  • If it is integer >=1, then this shifts the image horizontally by pixels in the range [-num, num]. For instance, 3 means shift horizontally by the pixels selected from the range [-2,-1,0,1,2]. So, the image may be shifted by 2 or 1 or 0 pixels.
  • Similarly for a 1D array.

height_shift_range: Similar to width_shift_range but in the vertical direction.

brightness_range: This produces images similar to as taken with different lighting conditions. In this, you pass the min and the max range based on which the image will be darkened or brightened. Values <1 darkens the image, >1 brightens the image and =1 means no change. For example, below line darkens the image as shown

rescale: This is to normalize the pixel values to a specific range. For 8-bit image, we generally rescale by 1/255 so as to have pixel values in the range 0 and 1.

shear_range: This is the shear angle in the counter-clockwise direction in degrees.

zoom_range: This zooms the image. If passed as float then [lower, upper] = [1-zoom_range, 1+zoom_range]. For instance, 0.2 means zoom in the range [0.8, 1.2]. Can also be passed a list directly.

channel_shift_range: This randomly shifts the values of the channels by the values specified. The below code sums up what this actually does.

Add random values to channel and then clipping depending on the max and min of the image.

Channel shift = 100

horizontal_flip and vertical flip: Randomly flips the input image in the horizontal and vertical directions respectively.

data_format: Either channels_first or channels_last (default).

preprocessing_function: This function is applied to each input after the augmentation step. Below is an example of one such function where images are blurred

How to use this?

Below is the code using which I have generated the above images

This way you can create augmented examples. In the next blog, we will discuss how to generate batches of augmented data using the flow method.

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Optical Character Recognition Pipeline: Text Detection and Segmentation

One of the most important module in optical character recognition pipeline is the text detection and segmentation which is also called as text localization. In the previous blog, we have seen various techniques to pre-process the input image which can help in improving our OCR accuracy. In this blog, we will learn how to localize text in an image, so that we can crop them out and then feed to our text recognition module to predict text in it.

What is text detection and segmentation?

It is the process of localizing all occurrence of text present in the image into meaningful units such as characters, words, and text lines. Then make segments of each of these units.

Character-based detection first detects individual characters and then group them into words. One way to do this is to locate characters by classifying Extremal Regions(MSER) and then groups the detected characters by an exhaustive search method.

Word-based detection usually works in a similar fashion as object detection. You can use Faster R-CNN and YOLO algorithms to perform this.

Text-line based detection detects text lines and then break it into individual words.

There are basically two types of text images that are fed to the text recognition module as inputs. One is scanned documents and others are natural scene text like street signs, storefront texts, etc.

Scanned Documents

Scanned documents generally have hundreds or thousands of words in it. We can apply deep neural networks like faster R-CNN and YOLO to localize words present in the documents. But sometimes these may not be able to localize all text present in the images because these algorithms are generally trained to detect less number of objects in the image. In that case, we need to apply some post-processing after deep nets to recognize remaining texts.

Another OpenCV method which we can be used for scanned documents is Maximally Stable Extremal Regions(MSER) using OpenCV.

MSER is a method that is used for blob detection in images. Using this method we can get the coordinates of the text regions and then we can generate the bounding boxes around each word in the image. Through which we can get the required input images to our text recognition module.

Natural Scenes

Natural scenes contain a lesser number of words in it but consist of other problems like distortions, occlusions, directional blur, cluttered background, etc. To overcome these problems we need to develop some deep learning algorithm that is mainly focused on natural scene texts ignoring above distortions. There are some robust open source algorithms available like EAST, CTPN, TextBoxes++, PixelLink and etc. These algorithms can also be used for localizing texts in the scanned documents but then you need to do some post processing to detect all text present in the image as I have mentioned earlier.

Source

Till now we have seen what is text segmentation and different algorithms to localize texts in an image. In the next blog, we will deep dive into these algorithms and figure out how we can implement it in our OCR pipeline.

Next Blog: Optical Character Recognition Pipeline: Text Detection and Segmentation Part-II

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Optical Character Recognition Pipeline: Text Detection and Segmentation Part-II

In the last blog, we have seen what is text detection and different types of algorithms to perform it, In this blog, we will learn more about text detection algorithms.

Efficient and Accurate Scene Text Detector(EAST)

It is a deep learning text detection method which has two stages one is fully convolutional network(FCN) and second is non-max suppression(NMS) merging stage. In FCN it uses U-shape network which directly produces text regions either word level or text line level. Here is the diagram of FCN used in the algorithm.

U-shape FCN uses features from different layers of PVANet and then merge them to produce the outputs. The yellow boxes are different layers of PVANet and green boxes are merging layers of feature extracted from PVANet. The reason behind this merging branch is to produce outputs for both small word regions and large word regions. Low-level features will help in finding small word regions and high-level features will help in finding large word regions. This network will output geometries either in the form of RBOX(containing 5 values of which 4 are top and left coordinate, width, height and one is rotation angle) or QUAD( 4 coordinates of a rectangle) with one score map to tell about the confidence level prediction of text in it.

In second, NMS merging stage, it uses thresholding to exclude out overlapping geometries and produce the most accurate geometries for the text regions.

To implement it in our OCR pipeline, we can use it’s GitHub Repository. To make it workable use the following steps:

  1. Clone the repository into your directory: ” git clone https://github.com/argman/EAST.git”
  2. Download its pretrained model and put inside EAST directory.
  3. Before testing it you need to compile the lanms.
  4. To test this model, go to your EAST directory and then run following command from terminal:

You can also train this model with your dataset either from scratch or use pre-trained model provided earlier. To train this model you need to provide dataset path and dataset should consist of training images with corresponding text file which will have coordinates of text present in the image.

Connectionist Text Proposal Network(CTPN)

CTPN is a deep learning method that accurately predicts text lines in a natural image. It is an end to end trainable model consists of both CNN and RNN layers. In general, the length of a text line varies frequently. To solve this problem authors of this paper have considered text lines as a sequence of fine-scale text proposals, where each proposal are having a fixed width of 16 pixels with varying height. Let’s see the below image.

In the above figure, each vertical rectangular box is a fine text proposal. To go through model’s architecture see below figure:

The input image is being sent to VGG-16 model. Features output from conv_5 layer(the layer just before fully connected layers) of VGG-16 model is taken. A sliding window of size 3X3 is moved over VGG-16 output features and then fed sequentially to RNN network which consists of 256D bi-directional LSTM. This LSTM layer is connected to 512D fully connected layer which will next produce the outputs.

Now see the generation of output using this algorithm.

  • This algorithm uses anchor boxes to detect the text of different height. Let say we use k anchor boxes then output will consist of three main parts.
  • One is 2k vertical coordinates where each anchor box have its y coordinate (center position of box) and height of anchor box.
  • Second 2k text/non-text scores and,
  • third is k side refinement offset.
  • Here they have used 10 anchor boxes of varying height between 11 to 273 pixels. For this they have fixed the horizontal location and predicted only the vertical heights.
  • On the basis of text/non-text scores, sequential text proposal are merged and text-lines are formed. Side refinement offsets are used to refine the two end points of a text line.

To implement it in our OCR pipeline, we can use it’s GitHub Repository. To make it workable use the following steps:

  • Clone the repository into your directory: ” git clone https://github.com/eragonruan/text-detection-ctpn.git”
  • Go to “text-detection-ctpn-banjin-dev” directory
  • Run following command one by one:
  • Download pretrained checkpoint from google drive
  • Extract it and put checkpoints_mlt/ in text-detection-ctpn/
  • Now put your text file in data/demo and output will be in data/res
  • Now run the following command to check the outputs

You can also train this model using your own data, just follow the steps provide in GitHub Repository.

A Single Shot Oriented Scene Text Detector(TextBoxes++)

It is an end-to-end trainable fast scene text detector which can even detect oriented text present in the image. It does not require any post processing except non-maximum suppression. The basic idea is taken from the object detection algorithm SSD(single shot detector). SSD aims to predict general objects in an image but when it comes for text detection it fails. To improve this on text dataset TextBoxes++ have been introduced. Let’s see the model’s architecture:

First 13 layers are from VGG16 model. Then 2 fully connected layers of VGG-16 are converted into convolution layers which are followed by 8 convolution layers. Finally, 6 Text-Box layers are connected to 6 different intermediate convolution layers of the model. These 6 Text-Box layers are output layer and at test time non-max separation is applied to merge the result of these 6 to predict the best ones.

Text-Box layers are the key component of TextBoxes++. These are also convolutional layer which predicts both presences of text and bounding box coordinates. It includes both oriented bounding boxes and minimum horizontal boxes. Text-Box layers are designed to tackle the problem of variable length words.

You can find it’s GitHub Repository here. In GitHub they have also implemented CRNN(convolution recurrent neural network) to recognize text detected by the TextBoxes++. To implement it, you can follow their GitHub directions. Here are some results of TextBoxes++.

Source

That’s enough for text detection, in the next blog, we will learn about text recognition. Hope you enjoy reading.

Next Blog: Optical Character Recognition Pipeline: Text Recognition

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Creating a CRNN model to recognize text in an image (Part-1)

In the earlier blogs, we learned various stages of optical character recognition pipeline. In this blog, we will create a convolutional recurrent neural network with CTC (Connectionist Temporal Classification) loss to implement our recognition model.

We will use the following steps to create our text recognition model.

  • Collecting Dataset
  • Preprocessing Data
  • Creating Network Architecture
  • Defining Loss function
  • Training model
  • Decoding outputs from prediction

Dataset

In this blog, we will use data provided by Visual Geometry Group. This is a huge dataset total of 10 GB images. Here I have used only 135000 images for the training set and 15000 images for validation dataset. This data contains text image segments which look like images shown below:

To download the dataset either you can directly download from this link or use the following commands to download the data and unzip.

Preprocessing

Now we are having our dataset, to make it acceptable for our model we need to use some preprocessing. We need to preprocess both the input image and output labels. To preprocess our input image we will use followings:

  • Read the image and convert into a gray-scale image
  • Make each image of size (128,32) by using padding
  • Expand image dimension as (128,32,1) to make it compatible with the input shape of architecture
  • Normalize the image pixel values by dividing it with 255.

To preprocess the output labels use the followings:

  • Read the text from the name of the image as the image name contains text written inside the image.
  • Encode each character of a word into some numerical value by creating a function( as ‘a’:0, ‘b’:1 …….. ‘z’:26 etc ). Let say we are having the word ‘abab’ then our encoded label would be [0,1,0,1]
  • Compute the maximum length from words and pad every output label to make it of the same size as the maximum length. This is done to make it compatible with the output shape of our RNN architecture.

In preprocessing step we also need to create two other lists: one is label length and other is input length to our RNN. These two lists are important for our CTC loss( we will see later ). Label length is the length of each output text label and input length is the same for each input to the LSTM layer which is 31 in our architecture.

Note: For more details on the Optical Character Recognition , please refer to the Mastering OCR using Deep Learning and OpenCV-Python course.

Following is the code for our preprocessing step:

Now you might have got some feeling about the training and validation data generation for our recognition model. In the next blog, we will use this data to train and test our neural network.

Next Blog: Creating a CRNN model to recognize text in an image (Part-2)

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Creating a CRNN model to recognize text in an image (Part-2)

In the previous blog, we have seen how to create training and validation dataset for our recognition model( Download and preprocess ). In this blog, we will create our model architecture and train it with the preprocessed data.

You can find full code here.

Model = CNN + RNN + CTC loss

Our model consists of three parts:

  1. The convolutional neural network to extract features from the image
  2. Recurrent neural network to predict sequential output per time-step
  3. CTC loss function which is transcription layer used to predict output for each time step.

Model Architecture

Here is the model architecture that we used:

This network architecture is inspired by this paper. Let’s see the steps that we used to create the architecture:

  1. Input shape for our architecture having an input image of height 32 and width 128.
  2. Here we used seven convolution layers of which 6 are having kernel size (3,3) and the last one is of size (2.2). And the number of filters is increased from 64 to 512 layer by layer.
  3. Two max-pooling layers are added with size (2,2) and then two max-pooling layers of size (2,1) are added to extract features with a larger width to predict long texts.
  4. Also, we used batch normalization layers after fifth and sixth convolution layers which accelerates the training process.
  5. Then we used a lambda function to squeeze the output from conv layer and make it compatible with LSTM layer.
  6. Then used two Bidirectional LSTM layers each of which has 128 units. This RNN layer gives the output of size (batch_size, 31, 63). Where 63 is the total number of output classes including blank character.

Let’s see the code for this architecture:

Loss Function

Now we have prepared model architecture, the next thing is to choose a loss function. In this text recognition problem, we will use the CTC loss function.

CTC loss is very helpful in text recognition problems. It helps us to prevent annotating each time step and help us to get rid of the problem where a single character can span multiple time step which needs further processing if we do not use CTC. If you want to know more about CTC( Connectionist Temporal Classification ) please follow this blog.

Note: For more details on the Optical Character Recognition , please refer to the Mastering OCR using Deep Learning and OpenCV-Python course.

A CTC loss function requires four arguments to compute the loss, predicted outputs, ground truth labels, input sequence length to LSTM and ground truth label length. To get this we need to create a custom loss function and then pass it to the model. To make it compatible with our model, we will create a model which takes these four inputs and outputs the loss. This model will be used for training and for testing we will use the model that we have created earlier “act_model”. Let’s see the code:

Compile and Train the Model

To train the model we will use Adam optimizer. Also, we can use Keras callbacks functionality to save the weights of the best model on the basis of validation loss.

In model.compile(), you can see that I have only taken y_pred and neglected y_true. This is because I have already taken labels as input to the model earlier.

Now train your model on 135000 training images and 15000 validation images.

Test the model

Our model is now trained with 135000 images. Now its time to test the model. We can not use our training model because it also requires labels as input and at test time we can not have labels. So to test the model we will use ” act_model ” that we have created earlier which takes only one input: test images.

As our model predicts the probability for each class at each time step, we need to use some transcription function to convert it into actual texts. Here we will use the CTC decoder to get the output text. Let’s see the code:

Here are some results from the trained model:

Pretty good Yeah! Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Optical Character Recognition Pipeline: Generating Dataset

The first step to create any deep learning model is to generate the dataset. In continuation of our optical character recognition pipeline, in this blog, we will see how we can get our training and test data.

In our OCR pipeline first, we need to get data for both segmentation and recognition(text). For the segmentation part, data will consist of images and corresponding files containing coordinates for words present in the image. Let’s see an example.

For recognition part data will consist of images and their corresponding text files. Here segmented images will contain a single word.

Image and Text

Open Source Dataset:

There are some open source dataset available for our pipeline. For the segmentation part here are some useful open source datasets.

Now let’s see some of the open source dataset for text recognition(images and their corresponding texts)

Synthetic Data:

In some cases, training your OCR model with synthetic data can also be useful. You can create your own synthetic data using some python script. You can also add some geometric transformation to simulate the real world distortion into the data. For an example here is a script to generate synthetic data for text recognition:

In the above code, I have generated English words images and corresponding text files using different font types with a font size of 14. Segmented images will look like below:

Five Segmented Images generated from above code

Annotation Tools and Manual Data:

Another way to create segmentation text dataset is by using annotation tools. In this case, you need to collect images manually or you can get images from the internet, then you need to manually annotate text in the images (Bounding Boxes). Annotation tools like labelimg can work in this case.

That’s all to generate the dataset. In the next blog, we will see image preprocessing steps to apply to these datasets. Hope you enjoy reading.

Next Blog: Optical Character Recognition Pipeline: Image Preprocessing

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Monitoring Training in Keras: Callbacks

Keras is a high level API, can run on top of Tensorflow, CNTK and Theano. Keras is preferable because it is easy and fast to learn. In this blog we will learn a set of functions named as callbacks, used during training in Keras.

Callbacks provides some advantages over normal training in keras. Here I will explain the important ones.

  • Callback can terminate a training when a Nan loss occurs.
  • Callback can save the model after every epoch, also you can save the best model.
  • Early Stopping: Callback can stop training when accuracy stops improving.

Terminate the training when Nan loss occurs

Let’s see the code how to terminate when Nan loss occurs while training:

Saving Model using Callbacks

To save model after every epoch in keras, we need to import ModelCheckpoint from keras.callbacks. Let’s see the below code which will save the model if validation loss decreases.

In the above code first we have created a ModelCheckpoint object by passing its required parameters.

  • filepath” defines the path where all checkpoints will be saved. If you want to save only the best model, then directly pass filepath with name “best_model.hdf5” which will overwrite the previous saved checkpoints.
  • monitor” will decide which quantity you want to monitor while training.
  • save_best_only” only saves if validation loss decreases.
  • mode with {auto, min, max} when chosen max, will stop training when monitored quantity stops increasing.

Then finally make a callback list and pass it into model.fit() with parameter callbacks.

Early Stopping

Callbacks can stop training when a monitored quantity has stopped improving. Lets see how:

  • min_delta: It is the minimum quantity which will be taken for improvement to be conceded.
  • patience: after this number of epochs if training does not improve, it will stop.
  • mode: in auto mode, the direction will be decided by monitored quantity.
  • baseline” the baseline value over which no improvement will stop the training.

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.