Optical Character Recognition Pipeline: Text Detection and Segmentation

One of the most important module in optical character recognition pipeline is the text detection and segmentation which is also called as text localization. In the previous blog, we have seen various techniques to pre-process the input image which can help in improving our OCR accuracy. In this blog, we will learn how to localize text in an image, so that we can crop them out and then feed to our text recognition module to predict text in it.

What is text detection and segmentation?

It is the process of localizing all occurrence of text present in the image into meaningful units such as characters, words, and text lines. Then make segments of each of these units.

Character-based detection first detects individual characters and then group them into words. One way to do this is to locate characters by classifying Extremal Regions(MSER) and then groups the detected characters by an exhaustive search method.

Word-based detection usually works in a similar fashion as object detection. You can use Faster R-CNN and YOLO algorithms to perform this.

Text-line based detection detects text lines and then break it into individual words.

There are basically two types of text images that are fed to the text recognition module as inputs. One is scanned documents and others are natural scene text like street signs, storefront texts, etc.

Scanned Documents

Scanned documents generally have hundreds or thousands of words in it. We can apply deep neural networks like faster R-CNN and YOLO to localize words present in the documents. But sometimes these may not be able to localize all text present in the images because these algorithms are generally trained to detect less number of objects in the image. In that case, we need to apply some post-processing after deep nets to recognize remaining texts.

Another OpenCV method which we can be used for scanned documents is Maximally Stable Extremal Regions(MSER) using OpenCV.

MSER is a method that is used for blob detection in images. Using this method we can get the coordinates of the text regions and then we can generate the bounding boxes around each word in the image. Through which we can get the required input images to our text recognition module.

Natural Scenes

Natural scenes contain a lesser number of words in it but consist of other problems like distortions, occlusions, directional blur, cluttered background, etc. To overcome these problems we need to develop some deep learning algorithm that is mainly focused on natural scene texts ignoring above distortions. There are some robust open source algorithms available like EAST, CTPN, TextBoxes++, PixelLink and etc. These algorithms can also be used for localizing texts in the scanned documents but then you need to do some post processing to detect all text present in the image as I have mentioned earlier.

Source

Till now we have seen what is text segmentation and different algorithms to localize texts in an image. In the next blog, we will deep dive into these algorithms and figure out how we can implement it in our OCR pipeline.

Next Blog: Optical Character Recognition Pipeline: Text Detection and Segmentation Part-II

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

3 thoughts on “Optical Character Recognition Pipeline: Text Detection and Segmentation

  1. Duverger Quentin

    Great post,
    I’m looking for techniques for OCR in noisy documents. I don’t want to use tesseract for segmentation part because the computation is not possible on GPU. Thus, I’m looking for tools that permits line segmentation or words segmentation in skew documents, noisy and large documents like invoice or lease. Do you have any suggestions for that ? I’m new to the OCR universe and I’m struggling to choose tools.
    Thanks a lot

    Reply
  2. Duverger Quentin

    Great post,

    First thanks for the post. I’m new to OCR universe and I’m struggling to choose a tool for detection and OCR for scanned documents like invoice. I know tesseract work well, but it can’t be use on GPU. So, I search a tool that can be use on large image (~2000*3000 pixels) because I think when I use CRAFT or EAST it’s too slow. I’m very new so, I’m a bit confuse about all the different techniques and I’d like to have your advises on this if it’s possible. I don’t know how to proceed with Yolo or faster R-CNN that you mentioned.

    Thanks a lot

    Reply

Leave a Reply