Removing Text highlighter using Colorspace OpenCV-Python

Have you ever thought why a number of color models are available in OpenCV? Obviously, they might have some pros and cons. So, in this blog, we will discuss one such application of color models where we will learn to remove the highlighted area from the text.

Use Case:

This pre-processing step (removing text highlighter) can be quite useful before feeding the image to an OCR system. Otherwise, the OCR system will output erroneous results.

Problem Overview

Suppose we are given an image as shown on the left and we want to pre-process it to remove the highlighter from the text as shown by the right image below.

Approach:

Since we know that there are some color models available (such as HSV) where it is easy to represent the color as compared to the RGB model. So, we will convert the image from RGB to that colorspace and then remove the color information. For instance, in the HSV color model, H and S tell us about the chromaticity (color information) of the light while V carries the greyscale information. So in HSV, if we remove the H and S channel and only keep the V channel we can obtain the desired results.

Steps:

  • Read the highlighted text image
  • Convert from BGR to HSV colorspace using cv2.cvtColor()
  • Extract the V channel

Code:

So, you saw that just by changing the colorspace and extracting channels we obtained satisfactory results. We can further improve the results by applying other operations such as thresholding or morphological operations etc. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Leave a Reply