Author Archives: kang & atul

Creating a Snake Game using OpenCV-Python

Isn’t it interesting to create a snake game using OpenCV-Python? And what if I tell you that you only gonna need

  • cv2.imshow()
  • cv2.waitKey()
  • cv2.putText()
  • cv2.rectangle()

So, let’s get started.

Import Libraries

For this we only need four libraries

Displaying Game Objects

  • Game Window: Here, I have used a 500×500 image as my game window.
  • Snake and Apple: I have used green squares for displaying a snake and a red square for an apple. Each square has a size of 10 units.

Game Rules

Now, let’s define some game rules

  • Collision with boundaries: If the snake collides with the boundaries, it dies.
  • Collision with self: If the snake collides with itself, it should die. For this, we only need to check whether the snake’s head is in snake body or not.
  • Collision with apple: If the snake collides with the apple, the score is increased and the apple is moved to a new location.

Also, on eating apple snake length should increase. Otherwise, snake moves as it is.

  • Snake game has a fixed time for a keypress. If you press any button in that time, the snake should move in that direction otherwise continue moving in the previous direction. Sadly, with OpenCV cv2.waitKey() function, if you hold down the left direction button, the snake starts moving fast in that direction. So, to make the snake movement uniform, i did something like this.

Because cv2.waitKey() returns -1 when no key is pressed, so this ‘k’ stores the first key pressed in that time. Because the while loop is for a fixed time, so it doesn’t matter how fast you pressed a key. It will always wait a fixed time.

  • Snake cannot move backward: Here, I have used the w, a, s, d controls for moving the snake. If the snake was moving right and we pressed the left button, it will continue moving right or in short snake cannot directly move backwards.

After seeing which direction button is pressed, we change our head position

Displaying the final Score

For displaying the final score, i have used cv2.putText() function.

Finally, our snake game is ready and looks like this

The full code can be found here.

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Snake Game Using Python Curses

In this tutorial we will learn how to create a snake game using python and curses.

What is Curses?

The curses is a library that can be used to create text user interface application. It is terminal controlled library i.e. the code written using curses can only be run through terminal.

Import Libraries

To start with creating a snake game using curses, we first need to import the following libraries:

The above import will work fine for Linux based systems, to make it compatible for windows you need to install curses. To do this you need to download curses for windows according to your python version from python extension packages and then run the following command:

Initializing Game Screen

After importing required libraries, first we need to initialize game screen and get the maximum height and width of the opened terminal screen. Using these height and width we will create a window for the game.

In the above code win.keypad(1) will initiate the keyboard to take the user’s input for the game. Also curses.curs_set(0) will set the cursor mode to be invisible in the screen.

Initialize snake and apple initial positions

Next, we will initialize the snake and apple(food) starting positions in the game screen. Also, we will initialize our game score to be zero.

In the above code win.addch() will add a diamond like symbol in the game screen according to specified apple position.

Specifying Game Over Conditions

For the snake game there are basically two conditions which defines how game will end. First if snake collide with one of the game window boundaries and second if snake collides with itself.

Playing the Game

In this game we will use four keyboard buttons, ‘up’, ‘down’, ‘left’ and ‘right’. To get a user input from keyboard we need to use win.getch() function.

In the above code win.border(0) will create a border around our game screen.

Now we will see the logic to move snake and eat apple. According to the game rules, snake will continue to move in the same direction if user do not press any button. Also snake can not move backward. In case user press any button, we need to update snake head’s position. Let’s see code:

Next there are two situations. One, in next step snake will move to a new position and second, in next step snake will eat apple. In case snake only moves to a new position, we will add one unit at it’s head and remove one unit from its tail according to the pressed direction. Another situation, if snake eats apple then we will add one unit at snake’s head and do not remove from it’s tail. We will also display a new apple at different location. Let’s see the code:

Then finally we will display the score in the screen and quit the game window.

Here are some images of the game that we have just created.

The full code can be found here.

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Snake Game Using Tensorflow Object Detection API – Part IV

In the last blog we have trained the model and saved the inference graph. In this blog we will learn how to use this inference graph for object detection and how to run our snake game using this trained object detection model.

To play snake game using this trained model, you first need to develop a snake game. But don’t worry you need not to develop it from scratch, you can clone this repository. And if you want to know algorithm behind this code you can follow this blog.

Now we have our snake game next thing is to use this object detection model to play the snake game. To do this we need to run both snake game file and following script from models/research folder simultaneously.

In the above code we need to specify path to our inference graph using ” PATH_TO_CKPT ” variable. Also we need to specify ” PATH_TO_LABELS ” variable with path of object-detection.pbtxt file. Then specify number of classes i.e. 4 in our case.

In the above script we have used ” pyautogui ” to press the button when particular hand gesture for a particular direction is detected.

Finally you can play snake game using your hand gestures. Let see some of the results.

Pretty well yeah. This is all for playing snake game using tensorflow object detection API. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Snake Game Using Tensorflow Object Detection API – Part III

In the previous blogs we have seen how to generate data for object detection and convert it into TFRecord format to train the model. In this blog we will learn how to use this data to train the model.

To train the model we will use the pre-trained model and then use transfer learning to train it on our dataset. I have used mobilenet pre trained model. Here is mobilenet model. For its configuration file you can go to model -> research -> object_detection -> samples -> configs ->> ssd_mobilenet_v1_pets.config.
The configuration file that we have downloaded, needs to be edited as per our requirement. In configuration file we have changed the no. of classes, no. of steps in training, path to model checkpoint and path to pbtxt files as shown below.

For the object-detection.pbtxt file, create a pbtxt file and put following text inside it to specify our labels for the problem.

Now go to models -> research -> object detection -> legecy and copy train.py file to models -> research folder.

Then create a folder named images inside models -> research folder. Put your mobilenet model, configuration file, train and test image data folders, and train and test csv label files. Inside training_data folder, create a folder named data and put your train and test TFRecord files. The hierarchy will look like this:

Also create a training folder inside the images folder where model will save its checkpoints. Now run the following command to train the model from models -> research folder.

Time for training your model will depend upon your machine configuration and no. of steps that you have mentioned in the configuration file.

Now we have our trained model and its checkpoints are saved inside the models/research/images/training folder. In order to test this model and use this model to detect objects we need to export the inference graph.

To do this first we need to copy models/research/object_detection/export_inference_graph.py to models/research/ folder. Then inside models/research folder create a folder named “snake” which will save the inference graph. From models -> research folder run the following command:

Now we are having forzen_inference_graph.pb inside models/research/snake folder which will be used to detect object using trained model.

This is all for training the model and saving the inference graph, in the next blog we will see how to use this inference graph for object detection and how to run our snake game using this trained object detection model.

Next Blog: Snake Game Using Tensorflow Object Detection API – Part IV

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Snake Game Using Tensorflow Object Detection API – Part II

In the previous blog, we did two things. First, we create a dataset and second we split this into training and test. In this blog, we will learn how to convert this dataset into TFRecord format for training.

Before creating TFRecord file, we just need to do one more step. In the last blog, we have generated XML files using LabelImg. To get labels for training and test dataset, we need to convert these XML files into CSV format. To do this we will use the following code which has been taken from this repository.

In the above main function, you should specify your XML files path for both train and test folder. The generated CSV files will contain columns as filename, width, and height of images, output label of images and coordinates of the annotated rectangular box as shown in the figure below

Once you have your train and test images with labels in CSV format, let’s convert data in TFRecord format.

A TFRecord file store your data as a sequence of binary strings. It has many advantages over normal data formats. To do this we will use the following code which has been taken from this repository. According to your requirement, you need to change the condition for labels at line 31 below.

Save this code in a file named generate_tfrecord.py. Now in order to use this code, first we need to clone tensorflow object detection API. For that do the following:

Then we need to do the following steps to avoid getting error of protoc:

  1. Go to this release link and download protobuf according to your operating system.
  2. Extract the downloaded file and go to bin folder inside it.
  3. Copy protoc.exe file and put in models -> research -> object_detection -> protos folder.
  4. In protos folder run the following command for .proto files.

After cloning this repository, copy generate_tfrecord.py inside models -> research folder and run the following command.

Above commands will generate two files named train.record and test.record which will be used for training of model.

This is all for generating TFRecord file, in the next blog we will perform training and testing of object detection model.

Next Blog: Snake Game Using Tensorflow Object Detection API – Part III

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Snake Game Using Tensorflow Object Detection API

Here, we will learn how to use tensorflow object detection API with the computer’s webcam to play a snake game. We will use hand gestures instead of the keyboard.

We will use following steps to play snake game using tensorflow object detection API:

  1. Generate dataset.
  2. Convert train and test datasets into tfrecord format.
  3. Train a pre-trained model using generated data.
  4. Integrate trained model with snake game.
  5. Play the snake game using your own hand gestures.

In this blog, we will cover only the first step i.e. how to create your own training data and the rest steps will be covered in the subsequent blogs.

You can find code here.

Generate Dataset

A snake game problem generally contains four directions to move i.e. up, down, right and left. For each of the four directions, we need to generate at least 100 images per direction. You can use your phone or laptop camera to do this. Try to generate images with a different background for better generalization. Below are some examples of images of hand gestures.

Hand Gestures

Now we have our captured images of hand gestures. The next thing is to annotate these images according to their classes. Which means we need to create rectangular boxes around hand gestures and label them appropriately. Don’t worry there is a tool named LabelImg, which is highly helpful to annotate images to create training and test dataset. To start with LabelImg you can follow there GitHub link. The start screen of Labellmg would look like this.

At the left side of the screen, you can find various options. Click on Open dir and choose the input image folder. Then click on Change save dir and select output folder where generated XML files will be saved. This XML file will contain coordinates of your generated rectangular box in the image, something like this.

To create a rectangular box in the image using Labellmg, you just need to press ‘W’ and then create a box and save it. You can create one or multiple boxes in one image as shown in the figure below. Repeat this for all the images.

Now we have images and their corresponding XML files. Then we will separate this dataset into training and testing in the 90/10 ratio. To do this we need to put 90% of images of each class ‘up’, ‘right’, ‘left’ and ‘down’ and their corresponding XML files in one folder and other 10% in other folder.

That’s all for creating dataset, in the next blog we will see how to create TFRecord files from these datasets which will be used for training of the model.

Next Blog: Snake Game Using Tensorflow Object Detection API – Part II

 Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Snake Game using Real Time Speech Recognition

Speech recognition can be very helpful in your daily activities like you can switch on and off your laptops, control your T.V and A.C., and handle other home appliances. In this blog, we will learn a fun activity to play a snake game using voice control. By learning this method you can apply it on other real life applications.

To perform speech recognition, you need to train a model. And training a model requires a large amount of data which requires a lot of time. To save this time I was searching for a pre-trained model. There are some open source api, but those are not that much accurate and fast. Luckily, I got to know about Porcupine. Porcupine is a self-service, highly-accurate, and lightweight wake word (voice control) engine. In this blog, I will show you how to play a snake game using porcupine GitHub repository.

To play a snake game, You first need to develop a snake game. But don’t worry you need not to develop it from scratch, you can clone this repository. And if you want to know algorithm behind this code you can follow this blog.

Now you have a snake game, the next thing is how to use speech recognition to play this game. First, clone the Porcupine repository into your system. It has some pre-trained wake words define in it. You can also use your own wake words. For this problem, I have used four wake words “go left”, “go right”, “go down” and “snake up”.

Here are the steps to play snake game using voice control:

  • First go to Porcupine directory that you have cloned.
  • Then go to tools -> optimizer -> System(windows or linux or mac) -> os type(64 ar 32 bit)
  • Then use pv_porcupine_optimizer.exe file to create wake word files. To do this you need following command
  • Here -r corresponds to resource directory which you can find inside Porcupine directory, -w corresponds to your wake word that you can choose, -o corresponds to the output directory of your wake word and -p corresponds to your platform(windows, linux or mac)
  • To generate four different wake words, I have run the above command four times.
  • Now you have created your wake words, the next thing is to integrate it with your snake game python code.
  • Inside Porcupine folder go to binding -> python. There is a file named porcupine.py
  • Open porcupine. py file and append following code at the last of it. You will also be needed to install pyaudio and pyautogui using pip.
  • Now run both porcupine.py file and your snake game code to play it with your voice control.

Now you have got an idea to use real time speech recognition. Hope you can find some real life applications to apply it.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

2D Histogram

In the last blog, we discussed 1-D histograms in which we analyze each channel separately. Suppose, we want to find the correlation between image channels, let’s say we are interested in finding like how many times a (red, green) pair of (100,56) appeared in an image. In such a case, a 1-D histogram will fail as it does not shows the relationship of intensities at the exact position between two channels.

To solve this problem, we need Multi-dimensional histograms like 2-D or 3D. With the help of 2-D histograms, we can analyze the channels together in groups of 2 (RG, GB, BR) or all together with 3D histograms. Let’s see what is a 2-D histogram and how to construct this using OpenCV Python.

A 2-D histogram counts the occurrence of combinations of intensities. Below figure shows a 2D histogram

Here, Y and X-axis correspond to the Red and Green channel ranges( for 8-bit, [0,255]) and each point within the histogram shows the frequency corresponding to each R and G pair. Frequency is color-coded here, otherwise, another dimension would be needed.

Let’s understand how to construct a 2-D histogram by taking a simple example.

Suppose, we have 4×4, 2-bit images of Red and Green channels(as shown below) and we want to plot their 2-D histogram.

  • First, we plot the R and G channel ranges(Here, [0,3]) on the X and Y-axis respectively. This will be our 2-D histogram.
  • Then, loop over each position within the channels, find the corresponding intensity pairs frequency and plot it in the 2-D histogram. These frequencies are then color-coded for ease of visualization.

Now, let’s see how to construct a 2-D histogram using OpenCV-Python

We use the same function cv2.calcHist() that we have used for a 1-D histogram. Just change the following parameters and rest is the same.

  • channels: [0,1] for (Blue, Green), [1,2] for (G, R) and [0,2] for (B, R).
  • bins: specify for each channel according to your need. e.g [256,256].
  • range: [0,256,0,256] for an 8-bit image.

Below is the sample code for this using OpenCV-Python

Always use Nearest Neighbour Interpolation when plotting a 2-D histogram.

Plotting a 2-D histogram using RGB channels is not a good choice as we cannot extract color information using 2 channels only. Still, this can be used for finding the correlation between channels, finding clipping or intensity proportions etc.

To extract color information, we need a color model in which two components/channels can solely represent the chromaticity (color) of the image. One such color model is HSV where H and S tell us about the color of the light. So, first convert the image from BGR to HSV and then apply the above code.

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Understanding Image Histograms

In this blog, we will discuss image histogram which is a must-have tool in your pocket. This will help in contrast enhancement, image segmentation, image compression, thresholding etc. Let’s see what is an image histogram and how to plot histogram using OpenCV and matplotlib.

What is an Image Histogram?

An image histogram tells us how the intensity values are distributed in an image. In this we plot the intensity values on the x-axis and the no. of pixels corresponding to intensity values on the y-axis. See the figure below.

This is called 1D histogram because we are taking only one feature into our consideration, i.e. greyscale intensity value of the pixel. In the next blog, we will discuss 2D histograms.

Now, let’s understand some terminologies associated with histogram

Tonal range refers to the region where most of the intensity values are present (See above figure). The left side represents the black and dark areas known as shadows, the middle represents medium grey or midtones and the right side represents light and pure white areas known as Highlights.

So, for a dark image the histogram will cover mostly the left side and center of the graph. While for a bright image, the histogram mostly rests on the right side and center of the graph as shown in the figure below

Now, let’s see how to plot the histogram for an image using OpenCV and matplotlib.

OpenCV: To calculate the image histogram, OpenCV provides the following function

cv2.calcHist(image, channel, mask, bins, range) 

  • image : input image, should be passed in a list. e.g. [image]
  • channel : index of the channel. for greyscale pass as [0], and for color image pass the desired channel as [0], [1], [2].
  • mask : provide if you want to calculate histogram for specific region otherwise pass None.
  • bins : No. of bins to use for each channel, should be passed as [256]
  • range : range of intensity values. For 8-bit pass as [0,256]

This returns a numpy.ndarray with shape (n_bins,1) which can then be plotted using matplotlib. Below is the code for this

Matplotlib: Unlike OpenCV, matplotlib directly finds the histogram and plots it using plt.hist()

For a color image, we can show each channel individually or we can first convert it into greyscale and then calculate the histogram. So, a color histogram can be expressed as “Three Intensity(Greyscale) Histograms”, each of which shows the brightness distribution of each individual Red/Green/Blue color channel. Below figure summarizes this.

Original Color Image

So, always see the histogram of the image before doing any other pre-processing operation. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

DensePose

Recently, Facebook researchers have released a paper named “DensePose: Dense Human pose Estimation in the Wild”, which establishes dense correspondences from a 2D RGB image to a 3D surface of human body, also in the presence of background, occlusions and scale variations.

DensePose can be understood as group of problems like, object detection, pose estimation, part and instance segmentation. This task can be applied to problems which requires 3D understanding. Some of them are:

  1. Graphics
  2. Augmented Reality
  3. Human-computer interaction
  4. General 3D based object understanding

Till now these tasks are being established by using depth sensors, which are highly costly. Several other works which aims to achieve dense correspondence uses pairs or sets of images. But, DensePose method requires single RGB image as input and only focused on most important visual category, Humans.

For image to surface mapping of humans, recent studies uses two stage method of first detecting joints in body by using a CNN and then fitting a deformable surface model such as SMPL. While DensePose uses end-to-end supervised learning method by collecting ground truth correspondences between RGB images and parametric surface models.

DensePose is inspired from DenseReg framework, which focused mainly on faces. But DensePose focuses on full human body, which has its own challenges due to variation in poses, high flexibility and complexity of body. To address these problems, authors have designed suitable architecture using Mask-RCNN.

This method consists of three stages:

  1. Manually collecting ground-truth datasets.
  2. Using CNN based models on collected datasets to predict dense correspondences.
  3. In-painting the constructed ground truths with a teacher network for better performance.

1. Collection of Dataset

Till now, no manually collected dataset exists, for dense correspondence of real images. Authors, have introduced a COCO-DensePose Dataset, with annotation of 50K humans which is having more than 5 million manually annotated correspondences.

In this task, human annotators are involved to annotate 2D image to 3D surfaces. If it was done by directly annotating it to 3D surface model, it would be cumbersome and very frustrating for annotators. So, Authors have acquired a two stage annotation pipeline and post measures accuracy of human annotators..

In the first stage, authors have delineated visible body parts like head, torso, leg, arms, hands and feet. And then designed these parts to isomorphic to a plane.

To simplify the annotation, authors have divided full body into 24 parts by flattening out the body s shown below.

In the second stage, authors have used k-means to sample maximum of 14 points on each part. Also to simplify this task, annotators are being provided with six pre-renderd views of same body part and asked to annotate in most suitable view of part. Surface coordinates of this annotated point will be used to mark on remaining views. See figure below:

Accuracy of Human Annotators

Here, annotators accuracy is measured over synthetic data. To calculate the accuracy, authors have compared the geodesic distance between true position generate by synthetic data and the one estimated by annotators to bring the synthesized image into correspondence with 3D surface. Geodesic distance is the distance between two vertices with the shortest path. Authors considered two types of evaluation measures to evaluate annotators accuracy.

Pointwise Evaluation: In this approach, geodesic distance is used as threshold for deciding ratio of correct points. With varying threshold, obtained a curve f(t) whose area under curve gives the summary of correspondence accuracy.

Per-instance Evaluation: For this type of evaluation, authors have introduced a geodesic point similarity formula.

With the above formula, GPS is calculated of every person instance on the image. And once this GPS matching score is calculated, they perform average precision and average recall with the GPS thresholding range b/w 0.5 to 0.95.

After performing these evaluations, it is found that annotation errors are greater at back and front part of torso and lesser at head and hand parts.

2. Using CNN based architectures on Generated Dataset

After generating ground truth dataset, it is time to train the deep neural network to predict the dense correspondences. Hence, authors have experimented with both fully convolution network and region based network( like Mask-RCNN), and found latter superior. Authors have combined DenseReg architecture with Mask-RCNN and introduced DensePose-RCNN.

Fully Convolution Network:

In this method they combined a classification and a regression task. In classification task, pixel is classified as either background or one of several body parts. Since we have divided full body into 24 parts, so classification will be 25 class( one background) classification.

Here c* is class with the highest probability in the classification task. After classifying pixel to which class it belongs, then it will do the regression to find U, V parameterization for its exact point in the 3D surface model. Regression is divided into 24 different regression task because each of 24 body parts are treated independently with their local coordinates.

Region Based Network:

After experimenting with fully convolution network, authors have found that it not as fruitful as region based network. For region based networks, they have used exact same architecture of MASK-RCNN till ROIAlign and then used fully convolution network for regression and classification same as DenseReg. This architecture is capable to work at 25 fps for 320X240 images and at 5 fps for 800×1100 images.

3. Ground-truth Interpolation using In-painting Network

During annotation, in every training sample annotators have annotated only a sparse set of pixel around 100-150. This does not hamper training because at the time of calculation of per-pixel loss, they have not included those pixels which are not annotated. But they found improved performance if they interpolated annotations of other pixel with the help of annotated pixels as shown below:

To do this , authors have trained a teacher network. They only focused on interpolation points on humans and ignored the background to reduce the background error.

Things to be noticed through this paper is a large scale dataset of ground-truth images for image to surface correspondence and combination of MASK R-CNN with DenseReg architecture to predict surface correspondences. This paper can pave a way to further reasearch and development in this field and can be boon to 3D modelling field.

Referenced Research paper: DensePose: Dense Human Pose Estimation in the Wild

Github : DensePose

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.