Creating a CRNN model to recognize text in an image (Part-1)

In the earlier blogs, we learned various stages of optical character recognition pipeline. In this blog, we will create a convolutional recurrent neural network with CTC (Connectionist Temporal Classification) loss to implement our recognition model.

We will use the following steps to create our text recognition model.

  • Collecting Dataset
  • Preprocessing Data
  • Creating Network Architecture
  • Defining Loss function
  • Training model
  • Decoding outputs from prediction

Dataset

In this blog, we will use data provided by Visual Geometry Group. This is a huge dataset total of 10 GB images. Here I have used only 135000 images for the training set and 15000 images for validation dataset. This data contains text image segments which look like images shown below:

To download the dataset either you can directly download from this link or use the following commands to download the data and unzip.

Preprocessing

Now we are having our dataset, to make it acceptable for our model we need to use some preprocessing. We need to preprocess both the input image and output labels. To preprocess our input image we will use followings:

  • Read the image and convert into a gray-scale image
  • Make each image of size (128,32) by using padding
  • Expand image dimension as (128,32,1) to make it compatible with the input shape of architecture
  • Normalize the image pixel values by dividing it with 255.

To preprocess the output labels use the followings:

  • Read the text from the name of the image as the image name contains text written inside the image.
  • Encode each character of a word into some numerical value by creating a function( as ‘a’:0, ‘b’:1 …….. ‘z’:26 etc ). Let say we are having the word ‘abab’ then our encoded label would be [0,1,0,1]
  • Compute the maximum length from words and pad every output label to make it of the same size as the maximum length. This is done to make it compatible with the output shape of our RNN architecture.

In preprocessing step we also need to create two other lists: one is label length and other is input length to our RNN. These two lists are important for our CTC loss( we will see later ). Label length is the length of each output text label and input length is the same for each input to the LSTM layer which is 31 in our architecture.

Note: For more details on the Optical Character Recognition , please refer to the Mastering OCR using Deep Learning and OpenCV-Python course.

Following is the code for our preprocessing step:

Now you might have got some feeling about the training and validation data generation for our recognition model. In the next blog, we will use this data to train and test our neural network.

Next Blog: Creating a CRNN model to recognize text in an image (Part-2)

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

10 thoughts on “Creating a CRNN model to recognize text in an image (Part-1)

  1. Deepthi

    I got this following error.

    InvalidArgumentError: Not enough time for target transition sequence (required: 37, available: 31)29You can turn this error into a warning by using the flag ignore_longer_outputs_than_inputs
    [[{{node ctc_3/CTCLoss}}]]

    each time I run the model.fit, the required number changes. How and what do I change in the code? I have implemented the same code as in your post, just changing minor things according to my requirements.

    Reply
  2. lakshmi

    hi can you please help me how to set up dataset.because i have downloaded total dataset. but when i am trying to execute with some 10% data it is giving the error as “ValueError: Error when checking input: expected input_1 to have 4 dimensions, but got array with shape (0, 1)”.Could you please help

    Reply
    1. Alejandro Soumah

      That is because you are having a problem locating the dateset , check that your dateset in unzipped and in the location that it says.

      Reply
  3. Kent Chen

    Is it possible to convert the Keras model to TensorRT(I want to run it in NVIDIA Jetson Nano)? I try to convert to onnx, but failed to convert onnx to TensorRT.

    Reply

Leave a Reply