Tag Archives: opencv

Creating a CRNN model to recognize text in an image (Part-1)

In the earlier blogs, we learned various stages of optical character recognition pipeline. In this blog, we will create a convolutional recurrent neural network with CTC (Connectionist Temporal Classification) loss to implement our recognition model.

We will use the following steps to create our text recognition model.

  • Collecting Dataset
  • Preprocessing Data
  • Creating Network Architecture
  • Defining Loss function
  • Training model
  • Decoding outputs from prediction


In this blog, we will use data provided by Visual Geometry Group. This is a huge dataset total of 10 GB images. Here I have used only 135000 images for the training set and 15000 images for validation dataset. This data contains text image segments which look like images shown below:

To download the dataset either you can directly download from this link or use the following commands to download the data and unzip.


Now we are having our dataset, to make it acceptable for our model we need to use some preprocessing. We need to preprocess both the input image and output labels. To preprocess our input image we will use followings:

  • Read the image and convert into a gray-scale image
  • Make each image of size (128,32) by using padding
  • Expand image dimension as (128,32,1) to make it compatible with the input shape of architecture
  • Normalize the image pixel values by dividing it with 255.

To preprocess the output labels use the followings:

  • Read the text from the name of the image as the image name contains text written inside the image.
  • Encode each character of a word into some numerical value by creating a function( as ‘a’:0, ‘b’:1 …….. ‘z’:26 etc ). Let say we are having the word ‘abab’ then our encoded label would be [0,1,0,1]
  • Compute the maximum length from words and pad every output label to make it of the same size as the maximum length. This is done to make it compatible with the output shape of our RNN architecture.

In preprocessing step we also need to create two other lists: one is label length and other is input length to our RNN. These two lists are important for our CTC loss( we will see later ). Label length is the length of each output text label and input length is the same for each input to the LSTM layer which is 31 in our architecture.

Note: For more details on the Optical Character Recognition , please refer to the Mastering OCR using Deep Learning and OpenCV-Python course.

Following is the code for our preprocessing step:

Now you might have got some feeling about the training and validation data generation for our recognition model. In the next blog, we will use this data to train and test our neural network.

Next Blog: Creating a CRNN model to recognize text in an image (Part-2)

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Image Enhancement

Till now, we learned the basics of an image. From now onwards, we will learn what actually is known as image processing. In this blog, we will learn what is image enhancement, different methods to perform image enhancement and then we will learn how we can perform this on real images.

According to MathWorks, Image enhancement is the process of adjusting digital images so that the results are more suitable for display or further image analysis. It is basically a preprocessing step.

Image enhancement can be done either in the spatial domain or transform domain. Spatial domain means we perform all operations directly on pixels while in transform domain we first transform an image into another domain (like frequency) do processing there and convert it back to the spatial domain by some inverse operations. We will be discussing these in detail in the next blogs.

Both spatial and transform domain have their own importance which we will discuss later. Generally, operations in spatial domain are more computationally efficient.

Processing in spatial domain can be divided into two main categories – one that operates on single pixels known as Intensity transformation and other known as Spatial filtering that works on the neighborhood of every pixel

The following example will motivate you about what we are going to study in the next few blogs

Before Contrast Enhancement
After Contrast Enhancement

In the next blog, we will discuss how basic arithmetic operations like addition, subtraction etc can be used for image enhancement. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Color Models

In the previous blogs, we represented the color image using the RGB components but this is not the only way available. There are different color models ( A color model is simply a way to define the color) available, each having their own pros and cons.

There are two types of color models available: Additive and Subtractive. Additive uses light (transmitted) to display color while subtractive models use printing inks. These models are fitted into different shapes to obtain new models (See HSI model below).

In this blog, we’ll discuss the three that are most commonly used in the context of digital image processing: RGB, CMY, and HSI

The RGB Color Model

In this, we construct a color cube whose 3 axes denote R, G, and B  respectively as shown below

Normalized RGB Color Cube; Source: Researchgate

This is an additive model, i.e. the colors present in the light add to form new colors. For example, Yellow has coordinate of (1,1,0) which means Yellow =  Red + Green. Similarly, for other colors like cyan = Blue + Green and magenta = Red +Blue.

R, G, and B are added together in varying proportions to produce an extensive range of colors. Mixing equal proportions of R, G, and B falls on the grayscale line.

Use: color monitors and most video cameras.

The CMYK Color Model

CMY stands for cyan, magenta, and yellow also known as secondary colors of light. K refers to black. An equal proportion of C, M, and Y produce muddly black and not pure black. That’s why we use CMYK instead of CMY model.

This is a subtractive model i.e colors are perceived as a result of reflected light. e.g. when light falls on a cyan coated surface, red is absorbed (or subtracted) while Green and Blue are reflected and thus G + B = Cyan. Similarly for magenta and yellow.

Thus, CMY can be obtained from RGB by subtracting RGB from the max intensity.

Use: Printing like books, magazines etc.

The HSI Color Model

HSI stands for Hue, Saturation, and Intensity. This model is similar to how humans perceive color. Let’s understand HSI terms

Hue: Color attribute that describes the pure color or dominant wavelength.

Saturation: Purity of Color or how much a pure color is diluted by white light.

Intensity: Amount of light

H and S tell us about the chromaticity (color information) of the light while I carries the greyscale information.

HSI model can be obtained by rotating the RGB cube such that Black is at the bottom and white at the top.

H varies from 0 to 120 degrees for Red, 120 – 240 for Green, and 240 -360 for Blue. Saturation can take value from 0 to 100%. Intensity value varies according to the bit size of an image.

Pros: Easier to represent the color than the RGB model.

Note: We can also use these color models for object tracking (See here).

In the next blog, we will see how different colors can be generated from these color models with the help of OpenCV. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Changing Video Resolution using OpenCV-Python

In this tutorial, I will show how to change the resolution of the video using OpenCV-Python. This blog is based on interpolation methods (Chapter-5) which we have discussed earlier.

Here, I will convert a 640×480 video to 1280×720. Let’s see how to do this


  1. Load a video using cv2.VideoCapture()
  2. Create a VideoWriter object using cv2.VideoWriter()
  3. Extract frame by frame
  4. Resize the frames using cv2.resize()
  5. Save the frames to a video file using cv2.VideoWriter()
  6. Release the VideoWriter and destroy all windows


Here, I have used Bicubic as the interpolation method, you can use any. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Bayer filter

In the previous blog, we saw different methods by which color image can be obtained from an image sensor. Out of these methods, the Bayer filter is most widely used today and in this blog, we will discuss it in detail.

To form a color image, we need to collect information at RGB wavelengths for all the pixels or sensors. But this process is expensive both in terms of time and money.

So in 1976, Bayer thought of an alternative. Instead of capturing all RGB information at each pixel, Bayer thought of capturing one out of RGB for each pixel. Now, each pixel will contain either R, G or B. To be able to form a color image, he decided 50% pixels be Green and rest equally to Red and Blue (to mimic human eye) and these are arranged in a pattern as shown below

He would then use interpolation or color demosaicing algorithm to find the missing information for example pixel capturing Red will need Green and Blue and so on. We will study in more detail about interpolation algorithms in next blog.

The overall procedure from Bayer to RGB color image can be summarized as

Source: Thesis

So, with Bayer filter, we are only storing one color information(either R, G or B) at each pixel which reduces the computation time and cost while maintaining the image quality. That’s why it is used widely.

Hope you understand the Bayer filter, why it is used and how the color image is obtained from the Bayer image.Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Understanding Images with OpenCV-Python

In the previous blogs, we learned about pixels, intensity value, color, and greyscale image. Now, with OpenCV and numpy, let’s try to visualize these concepts.

First, we need an image, either you can load one or can make own image. Loading an image from the device looks like this

To access the pixel location, we must first know the shape of the image. This can be done by

It returns a tuple of the number of rows, columns, and channels (if the image is color).

Total number of pixels can be found either by multiplying rows, columns, channels found using img.shape or by using the following command

After knowing the image shape, we can access the pixel location by its row and column coordinates as

This returns the intensity value at that pixel location. For a greyscale image, intensity or pixel value is a single integer while for a color image, it is an array of Blue, Green, Red values.

Note: OpenCV reads the color image in BGR mode and not in RGB mode. Be careful

We know that intensity levels depend on the number of bits that can be found by

To access the RGB channels separately, use numpy indexing as shown below

You can change the pixel value just by normal assignment as shown below

You can change the color image to greyscale using the following command

All the operations that you can perform on the array like add, subtract etc apply to images also.

Play with all these commands to understand better. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Read, Write and Display Videos with OpenCV Python

In this blog, we will see how to Read, Write and Display Videos using OpenCV. Since a video is made up of images, most of the commands we learned in the previous blog also applies here.

Let’s see by an example how to capture video from the camera and display it.

cv2.VideoCapture(0) will open the default camera. You can select the second camera by passing 1, third by passing 2 and so on. This creates a VideoCapture object (“cap” here).

cap.read() capture frame by frame. This returns two values, frame and ret. If the frame is read correctly, ret will be True otherwise False.

cv2.waitKey(1) & 0xFF == ord(‘q’) will exit the video when ‘q’ is pressed.

cap.release() closes video file or capturing device.

If you want to play a video from a file, just change the cv2.VideoCapture(0) function in the above code by giving the file path as cv2.VideoCapture(‘F:/downloads/Python.mp4’). Also, use the appropriate time for cv2.waitKey() function (25 will be OK).

Saving a Video:

First, create a VideoWriter object with cv2.VideoWriter(output filename, fourcc, fps, frameSize). fourcc(4-character code of codec) is used to compress the frames. After creating this object, use object.write() to save the video. Let’s see an example

Now, you might have got some feeling about the basic video commands in OpenCV. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Read, Write and Display Images with OpenCV

In this blog, we will see how to Read, Write and Display Images using OpenCV. I hope you have installed OpenCV, numpy, and matplotlib libraries, if not, please refer to this blog.

Read an image:

To read an image, use the function cv2.imread(filename[, flags]) where filename is the full path of image and flags specifies the way image should be read (>0 for color, =0 for greyscale, and <0 for loading image as is (with alpha channel)).

If the image cannot be read (because of missing file, improper permissions, unsupported or invalid format), the function returns an empty matrix, not an error.

Display an image:

To display an image in a window, use the function cv2.imshow(winname, image) where the first argument is the Name of the window and second is the Image to be shown. So, this will first create a window named as image and displays the image in that window.

Note: This function must be followed by cv2.waitkey(delay) function otherwise the image wouldn’t be displayed.

cv2.waitKey(delay) decides for how long the image will be displayed. Its argument delay is the time in milliseconds. If the delay is <=0, the image will be shown forever otherwise destroyed after delay milliseconds.

cv2.destroyAllWindows() simply destroys all the windows we created.

Special Case: We can create a window first and load the image to it later. Just write the below code line before the cv2.imshow() function.

Write an image:

To save an image, use the function cv2.imwrite(filename, image) where the first argument is the file name with which we want to save the image file, the second argument is the image you want to save.

This will save the image in JPEG format in the working directory.

Now, you might have got some feeling about the basic image commands in OpenCV. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Installing Python OpenCV and other libraries

Here, we will be installing the libraries that are required to perform image processing operations. We will be using the following Python libraries

  1. OpenCV
  2. Numpy
  3. Matplotlib

Why we are using OpenCV, not Matlab or any other?

  1. Because it is open source, fast(written in C/C++), memory efficient and easy to install (can run on any device that can run C).
  2. if you really want to learn about how computer vision works from the initial steps to the last, I suggest you learn OpenCV first then use any deep learning library like Tensorflow, pytorch etc when you’re ready to train a deep learning algorithm for better performance/accuracy results.

Installing Numpy, OpenCV or any library      (For WINDOWS)

There are two ways to install any library in Python IDLE,

  1. using pip command: In the installed Python folder, go to Scripts folder and open command prompt(press and hold Shift + Right Click and select Open command window here).  Then write pip install library name to get it installed. For example
  2. using .whl file:  First download .whl file of any library (version corresponding to your Python) from here. Then open the command prompt where you have downloaded this file and write pip install filename.whl. For example

Installing Numpy, OpenCV or any library    (For UBUNTU)

First, install pip using apt-get, then you can install any library as shown below

Anaconda: open the Anaconda prompt and write pip install numpy or any other library name which you want to install.

Now, you might have got some feeling about how to install Python libraries. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.