Optical Character Recognition Pipeline: Text Recognition

In the previous blogs, we covered the OCR text detection step. Now, it’s time to move on to the OCR’s next pipeline component, which is Text Recognition. So, let’s get started.

Text Recognition

As you might remember, in the text detection step, we segmented out the text regions. Now, it’s time to recognize what text is present in those segments. This is known as Text Recognition. For instance, see the below image where we have segments on the left and the recognized text on the right. This is what we want, i.e. recognize the text present in the segments.

So, what we will do is, pass each segment one-by-one to our text recognition model that will output the recognized text. In general, the Text Recognition step outputs a text file that contains each segment’s bounding box coordinates along with the recognized text. For instance, see the below image(right) that contains 3 columns i.e. the segment name, coordinates, and the recognized text.

Now, you may ask Why coordinates? This will become clear when we will discuss Restructuring (the next step).

Similar to text detection, text recognition has also been a long-standing research topic in computer vision. Traditional text recognition methods generally consist of 3 main steps

  • Image pre-processing
  • character segmentation
  • character recognition

That is they mainly work at a character level. But when we deal with images having a complex background, font, or other distortions, character segmentation becomes a really challenging task. Thus, to avoid character segmentation, two major techniques are adopted

  • Connectionist Temporal Classification (CTC) based
  • Attention-based

In the next blog, let’s understand in detail, what is CTC and how it is used in Text Recognition. Then we will move to the attention-based algorithms. Till then, have a great time. Hope you enjoy reading.

If you have any doubts/suggestions please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Leave a Reply