Author Archives: kang & atul

Creating a Snake Game using OpenCV-Python

Import Libraries

For this we only need four libraries

import numpy as np
import cv2
import random
import time

import numpy as np

import cv2

import random

import time

Displaying Game Objects

Game Window: Here, I have used a 500×500 image as my game window.

img = np.zeros((500,500,3),dtype='uint8')

1	img = np.zeros((500,500,3),dtype='uint8')

Snake and Apple: I have used green squares for displaying a snake and a red square for an apple. Each square has a size of 10 units.

# Displaying the snake (Green rectangles)
snake_position = [[250,250],[240,250],[230,250]]
for position in snake_position:
        cv2.rectangle(img,(position[0],position[1]),(position[0]+10,position[1]+10),(0,255,0),3)

# Displaying the snake (Green rectangles)

snake_position = [[250,250],[240,250],[230,250]]

for position in snake_position:

cv2.rectangle(img,(position[0],position[1]),(position[0]+10,position[1]+10),(0,255,0),3)

# Display apple (Red rectangles)
apple_position = [random.randrange(1,50)*10,random.randrange(1,50)*10]
cv2.rectangle(img,(apple_position[0],apple_position[1]),(apple_position[0]+10,apple_position[1]+10),(0,0,255),3)

# Display apple (Red rectangles)

apple_position = [random.randrange(1,50)*10,random.randrange(1,50)*10]

cv2.rectangle(img,(apple_position[0],apple_position[1]),(apple_position[0]+10,apple_position[1]+10),(0,0,255),3)

Game Rules

Now, let’s define some game rules

Collision with boundaries: If the snake collides with the boundaries, it dies.

def collision_with_boundaries(snake_head):
    if snake_head[0]>=500 or snake_head[0]<=0 or snake_head[1]>=500 or snake_head[1]<=0 :
        return 1
    else:
        return 0

def collision_with_boundaries(snake_head):

if snake_head[0]>=500 or snake_head[0]<=0 or snake_head[1]>=500 or snake_head[1]<=0 :

return 1

else:

return 0

Collision with self: If the snake collides with itself, it should die. For this, we only need to check whether the snake’s head is in snake body or not.

def collision_with_self(snake_position):
    snake_head = snake_position[0]
    if snake_head in snake_position[1:]:
        return 1
    else:
        return 0

def collision_with_self(snake_position):

snake_head = snake_position[0]

if snake_head in snake_position[1:]:

return 1

else:

return 0

Collision with apple: If the snake collides with the apple, the score is increased and the apple is moved to a new location.

def collision_with_apple(apple_position, score):
    apple_position = [random.randrange(1,50)*10,random.randrange(1,50)*10]
    score += 1
    return apple_position, score

def collision_with_apple(apple_position, score):

apple_position = [random.randrange(1,50)*10,random.randrange(1,50)*10]

score += 1

return apple_position, score

Also, on eating apple snake length should increase. Otherwise, snake moves as it is.

if snake_head == apple_position:
        apple_position, score = collision_with_apple(apple_position, score)
        snake_position.insert(0,list(snake_head))

    else:
        snake_position.insert(0,list(snake_head))
        snake_position.pop()

if snake_head == apple_position:

apple_position, score = collision_with_apple(apple_position, score)

snake_position.insert(0,list(snake_head))

else:

snake_position.insert(0,list(snake_head))

snake_position.pop()

Snake game has a fixed time for a keypress. If you press any button in that time, the snake should move in that direction otherwise continue moving in the previous direction. Sadly, with OpenCV cv2.waitKey() function, if you hold down the left direction button, the snake starts moving fast in that direction. So, to make the snake movement uniform, i did something like this.

    t_end = time.time() + 0.2
    k = -1
    while time.time() < t_end:
        if k == -1:
            k = cv2.waitKey(125)
        else:
            continue

t_end = time.time() + 0.2

k = -1

while time.time() < t_end:

if k == -1:

k = cv2.waitKey(125)

else:

continue

Because cv2.waitKey() returns -1 when no key is pressed, so this ‘k’ stores the first key pressed in that time. Because the while loop is for a fixed time, so it doesn’t matter how fast you pressed a key. It will always wait a fixed time.

Snake cannot move backward: Here, I have used the w, a, s, d controls for moving the snake. If the snake was moving right and we pressed the left button, it will continue moving right or in short snake cannot directly move backwards.

# 0-Left, 1-Right, 3-Up, 2-Down, q-Break
# a-Left, d-Right, w-Up, s-Down
if k == ord('a') and prev_button_direction != 1:
    button_direction = 0
elif k == ord('d') and prev_button_direction != 0:
    button_direction = 1
elif k == ord('w') and prev_button_direction != 2:
    button_direction = 3
elif k == ord('s') and prev_button_direction != 3:
    button_direction = 2
elif k == ord('q'):
    break
else:
    button_direction = button_direction

# 0-Left, 1-Right, 3-Up, 2-Down, q-Break

# a-Left, d-Right, w-Up, s-Down

if k == ord('a') and prev_button_direction != 1:

button_direction = 0

elif k == ord('d') and prev_button_direction != 0:

button_direction = 1

elif k == ord('w') and prev_button_direction != 2:

button_direction = 3

elif k == ord('s') and prev_button_direction != 3:

button_direction = 2

elif k == ord('q'):

break

else:

button_direction = button_direction

After seeing which direction button is pressed, we change our head position

if button_direction == 1:
    snake_head[0] += 10
elif button_direction == 0:
    snake_head[0] -= 10
elif button_direction == 2:
    snake_head[1] += 10
elif button_direction == 3:
    snake_head[1] -= 10

if button_direction == 1:

snake_head[0] += 10

elif button_direction == 0:

snake_head[0] -= 10

elif button_direction == 2:

snake_head[1] += 10

elif button_direction == 3:

snake_head[1] -= 10

Displaying the final Score

For displaying the final score, i have used cv2.putText() function.

if collision_with_boundaries(snake_head) == 1 or collision_with_self(snake_position) == 1:
    font = cv2.FONT_HERSHEY_SIMPLEX
    img = np.zeros((500,500,3),dtype='uint8')
    cv2.putText(img,'Your Score is {}'.format(score),(140,250), font, 1,(255,255,255),2,cv2.LINE_AA)
    cv2.imshow('a',img)
    cv2.waitKey(0)
    break

if collision_with_boundaries(snake_head) == 1 or collision_with_self(snake_position) == 1:

font = cv2.FONT_HERSHEY_SIMPLEX

img = np.zeros((500,500,3),dtype='uint8')

cv2.putText(img,'Your Score is {}'.format(score),(140,250), font, 1,(255,255,255),2,cv2.LINE_AA)

cv2.imshow('a',img)

cv2.waitKey(0)

break

Finally, our snake game is ready and looks like this

The full code can be found here.

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Snake Game Using Python Curses

1 Reply

In this tutorial we will learn how to create a snake game using python and curses.

What is Curses?

The curses is a library that can be used to create text user interface application. It is terminal controlled library i.e. the code written using curses can only be run through terminal.

Import Libraries

To start with creating a snake game using curses, we first need to import the following libraries:

import random
import curses
import time

import random

import curses

import time

The above import will work fine for Linux based systems, to make it compatible for windows you need to install curses. To do this you need to download curses for windows according to your python version from python extension packages and then run the following command:

pip install curses‑2.2‑cp36‑cp36m‑win_amd64.whl

1	pip install curses‑2.2‑cp36‑cp36m‑win_amd64.whl

Initializing Game Screen

After importing required libraries, first we need to initialize game screen and get the maximum height and width of the opened terminal screen. Using these height and width we will create a window for the game.

sc = curses.initscr()
h, w = sc.getmaxyx()
win = curses.newwin(h, w, 0, 0)

win.keypad(1)
curses.curs_set(0)

sc = curses.initscr()

h, w = sc.getmaxyx()

win = curses.newwin(h, w, 0, 0)

win.keypad(1)

curses.curs_set(0)

In the above code win.keypad(1) will initiate the keyboard to take the user’s input for the game. Also curses.curs_set(0) will set the cursor mode to be invisible in the screen.

Initialize snake and apple initial positions

Next, we will initialize the snake and apple(food) starting positions in the game screen. Also, we will initialize our game score to be zero.

snake_head = [10,15]
snake_position = [[15,10],[14,10],[13,10]]
apple_position = [20,20]
score = 0

win.addch(apple_position[0], apple_position[1], curses.ACS_DIAMOND)
key = curses.KEY_RIGHT

snake_head = [10,15]

snake_position = [[15,10],[14,10],[13,10]]

apple_position = [20,20]

score = 0

win.addch(apple_position[0], apple_position[1], curses.ACS_DIAMOND)

key = curses.KEY_RIGHT

In the above code win.addch() will add a diamond like symbol in the game screen according to specified apple position.

Specifying Game Over Conditions

For the snake game there are basically two conditions which defines how game will end. First if snake collide with one of the game window boundaries and second if snake collides with itself.

def collision_with_boundaries(snake_head):
    if snake_head[0]>=h-1 or snake_head[0]<=0 or snake_head[1]>=w-1 or snake_head[1]<=0 :
        return 1
    else:
        return 0

def collision_with_self(snake_position):
    snake_head = snake_position[0]
    if snake_head in snake_position[1:]:
        return 1
    else:
        return 0

def collision_with_boundaries(snake_head):

if snake_head[0]>=h-1 or snake_head[0]<=0 or snake_head[1]>=w-1 or snake_head[1]<=0 :

return 1

else:

return 0

def collision_with_self(snake_position):

snake_head = snake_position[0]

if snake_head in snake_position[1:]:

return 1

else:

return 0

Playing the Game

In this game we will use four keyboard buttons, ‘up’, ‘down’, ‘left’ and ‘right’. To get a user input from keyboard we need to use win.getch() function.

win.border(0)
win.timeout(100)

next_key = win.getch()
if next_key == -1:
    key = key
else:
    key = next_key

win.border(0)

win.timeout(100)

next_key = win.getch()

if next_key == -1:

key = key

else:

key = next_key

In the above code win.border(0) will create a border around our game screen.

Now we will see the logic to move snake and eat apple. According to the game rules, snake will continue to move in the same direction if user do not press any button. Also snake can not move backward. In case user press any button, we need to update snake head’s position. Let’s see code:

if key == curses.KEY_LEFT and prev_button_direction != 1:
    button_direction = 0
elif key == curses.KEY_RIGHT and prev_button_direction != 0:
    button_direction = 1
elif key == curses.KEY_UP and prev_button_direction != 2:
    button_direction = 3
elif key == curses.KEY_DOWN and prev_button_direction != 3:
    button_direction = 2
else:
    pass

prev_button_direction = button_direction

if button_direction == 1:
    snake_head[1] += 1
elif button_direction == 0:
    snake_head[1] -= 1
elif button_direction == 2:
    snake_head[0] += 1
elif button_direction == 3:
    snake_head[0] -= 1

if key == curses.KEY_LEFT and prev_button_direction != 1:

button_direction = 0

elif key == curses.KEY_RIGHT and prev_button_direction != 0:

button_direction = 1

elif key == curses.KEY_UP and prev_button_direction != 2:

button_direction = 3

elif key == curses.KEY_DOWN and prev_button_direction != 3:

button_direction = 2

else:

pass

prev_button_direction = button_direction

if button_direction == 1:

snake_head[1] += 1

elif button_direction == 0:

snake_head[1] -= 1

elif button_direction == 2:

snake_head[0] += 1

elif button_direction == 3:

snake_head[0] -= 1

Next there are two situations. One, in next step snake will move to a new position and second, in next step snake will eat apple. In case snake only moves to a new position, we will add one unit at it’s head and remove one unit from its tail according to the pressed direction. Another situation, if snake eats apple then we will add one unit at snake’s head and do not remove from it’s tail. We will also display a new apple at different location. Let’s see the code:

def collision_with_apple(score):
    apple_position = [random.randint(1,h-2),random.randint(1,w-2)]
    score += 1
    return apple_position, score


if snake_head == apple_position:
    apple_position, score = collision_with_apple(score)
    snake_position.insert(0, list(snake_head))
    win.addch(apple_position[0], apple_position[1], curses.ACS_DIAMOND)

else:
    snake_position.insert(0, list(snake_head))
    last = snake_position.pop()
    win.addch(last[0], last[1], ' ')

def collision_with_apple(score):

apple_position = [random.randint(1,h-2),random.randint(1,w-2)]

score += 1

return apple_position, score

if snake_head == apple_position:

apple_position, score = collision_with_apple(score)

snake_position.insert(0, list(snake_head))

win.addch(apple_position[0], apple_position[1], curses.ACS_DIAMOND)

else:

snake_position.insert(0, list(snake_head))

last = snake_position.pop()

win.addch(last[0], last[1], ' ')

Then finally we will display the score in the screen and quit the game window.

sc.addstr(10, 30, 'Your Score is:  '+str(score))
sc.refresh()
time.sleep(2)
curses.endwin()

sc.addstr(10, 30, 'Your Score is: '+str(score))

sc.refresh()

time.sleep(2)

curses.endwin()

Here are some images of the game that we have just created.

The full code can be found here.

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Snake Game Using Tensorflow Object Detection API – Part IV

1 Reply

In the last blog we have trained the model and saved the inference graph. In this blog we will learn how to use this inference graph for object detection and how to run our snake game using this trained object detection model.

To play snake game using this trained model, you first need to develop a snake game. But don’t worry you need not to develop it from scratch, you can clone this repository. And if you want to know algorithm behind this code you can follow this blog.

Now we have our snake game next thing is to use this object detection model to play the snake game. To do this we need to run both snake game file and following script from models/research folder simultaneously.

import os
import sys
from multiprocessing import Value

import cv2
import numpy as np
import pyautogui
import tensorflow as tf

cap = cv2.VideoCapture(0)

sys.path.append("..")

from object_detection.utils import label_map_util

from object_detection.utils import visualization_utils as vis_util

# # Model preparation

# Path to frozen detection graph. This is the actual model that is used for the object detection.
PATH_TO_CKPT = 'snake/frozen_inference_graph.pb'

# List of the strings that is used to add correct label for each box.
PATH_TO_LABELS = os.path.join('images/data', 'object-detection.pbtxt')

NUM_CLASSES = 4

# ## Load a (frozen) Tensorflow model into memory.
detection_graph = tf.Graph()
with detection_graph.as_default():
    od_graph_def = tf.GraphDef()
    with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
        serialized_graph = fid.read()
        od_graph_def.ParseFromString(serialized_graph)
        tf.import_graph_def(od_graph_def, name='')

# ## Loading label map
label_map = label_map_util.load_labelmap(PATH_TO_LABELS)
categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=NUM_CLASSES,
                                                            use_display_name=True)
category_index = label_map_util.create_category_index(categories)


with detection_graph.as_default():
#     from directkeys import PressKey, ReleaseKey, W

    # enter your monitor's resolution or use a library to fetch this - I had to hard code due to issues with
    # dual monitor setup
    x, y = 288, 512

    # init process safe variables for workers
    objectX, objectY = Value('d', 0.0), Value('d', 0.0)
    objectX_previous = None
    objectY_previous = None
    with tf.Session(graph=detection_graph) as sess:
        # Definite input and output Tensors for detection_graph
        image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
        # Each box represents a part of the image where a particular object was detected.
        detection_boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
        # Each score represent how level of confidence for each of the objects.
        # Score is shown on the result image, together with the class label.
        detection_scores = detection_graph.get_tensor_by_name('detection_scores:0')
        detection_classes = detection_graph.get_tensor_by_name('detection_classes:0')
        num_detections = detection_graph.get_tensor_by_name('num_detections:0')
        while True:
            ret, image_np = cap.read()
            # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
            image_np_expanded = np.expand_dims(image_np, axis=0)
            # Actual detection.
            (boxes, scores, classes, num) = sess.run(
                [detection_boxes, detection_scores, detection_classes, num_detections],
                feed_dict={image_tensor: image_np_expanded})
            # Visualization of the results of a detection.
            vis_util.visualize_boxes_and_labels_on_image_array(
                image_np,
                np.squeeze(boxes),
                np.squeeze(classes).astype(np.int32),
                np.squeeze(scores),
                category_index,
                use_normalized_coordinates=True,
                line_thickness=8)
            cv2.imshow('controls detection', image_np)
            if cv2.waitKey(50) & 0xFF == ord('q'):
                cv2.destroyAllWindows()
                break


            '''MOVE'''
            # press 'w' if bounding box of finger detected
            objects = np.where(classes[0] == 1)[0]

            # calculate center of box if detection exceeds threshold
            if len(objects) > 0 and scores[0][objects][0] > 0.15:
                pyautogui.press('up')

                
            objects = np.where(classes[0] == 2)[0]

            # calculate center of box if detection exceeds threshold
            if len(objects) > 0 and scores[0][objects][0] > 0.15:
                pyautogui.press('down')
                
                
            objects = np.where(classes[0] == 3)[0]

            # calculate center of box if detection exceeds threshold
            if len(objects) > 0 and scores[0][objects][0] > 0.15:
                pyautogui.press('left')
                
                
            objects = np.where(classes[0] == 4)[0]

            # calculate center of box if detection exceeds threshold
            if len(objects) > 0 and scores[0][objects][0] > 0.15:
                pyautogui.press('right')
                
            
cap.release()

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

import os

import sys

from multiprocessing import Value

import cv2

import numpy as np

import pyautogui

import tensorflow as tf

cap = cv2.VideoCapture(0)

sys.path.append("..")

from object_detection.utils import label_map_util

from object_detection.utils import visualization_utils as vis_util

# # Model preparation

# Path to frozen detection graph. This is the actual model that is used for the object detection.

PATH_TO_CKPT = 'snake/frozen_inference_graph.pb'

# List of the strings that is used to add correct label for each box.

PATH_TO_LABELS = os.path.join('images/data', 'object-detection.pbtxt')

NUM_CLASSES = 4

# ## Load a (frozen) Tensorflow model into memory.

detection_graph = tf.Graph()

with detection_graph.as_default():

od_graph_def = tf.GraphDef()

with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:

serialized_graph = fid.read()

od_graph_def.ParseFromString(serialized_graph)

tf.import_graph_def(od_graph_def, name='')

# ## Loading label map

label_map = label_map_util.load_labelmap(PATH_TO_LABELS)

categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=NUM_CLASSES,

use_display_name=True)

category_index = label_map_util.create_category_index(categories)

with detection_graph.as_default():

# from directkeys import PressKey, ReleaseKey, W

# enter your monitor's resolution or use a library to fetch this - I had to hard code due to issues with

# dual monitor setup

x, y = 288, 512

# init process safe variables for workers

objectX, objectY = Value('d', 0.0), Value('d', 0.0)

objectX_previous = None

objectY_previous = None

with tf.Session(graph=detection_graph) as sess:

# Definite input and output Tensors for detection_graph

image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')

# Each box represents a part of the image where a particular object was detected.

detection_boxes = detection_graph.get_tensor_by_name('detection_boxes:0')

# Each score represent how level of confidence for each of the objects.

# Score is shown on the result image, together with the class label.

detection_scores = detection_graph.get_tensor_by_name('detection_scores:0')

detection_classes = detection_graph.get_tensor_by_name('detection_classes:0')

num_detections = detection_graph.get_tensor_by_name('num_detections:0')

while True:

ret, image_np = cap.read()

# Expand dimensions since the model expects images to have shape: [1, None, None, 3]

image_np_expanded = np.expand_dims(image_np, axis=0)

# Actual detection.

(boxes, scores, classes, num) = sess.run(

[detection_boxes, detection_scores, detection_classes, num_detections],

feed_dict={image_tensor: image_np_expanded})

# Visualization of the results of a detection.

vis_util.visualize_boxes_and_labels_on_image_array(

image_np,

np.squeeze(boxes),

np.squeeze(classes).astype(np.int32),

np.squeeze(scores),

category_index,

use_normalized_coordinates=True,

line_thickness=8)

cv2.imshow('controls detection', image_np)

if cv2.waitKey(50) & 0xFF == ord('q'):

cv2.destroyAllWindows()

break

'''MOVE'''

# press 'w' if bounding box of finger detected

objects = np.where(classes[0] == 1)[0]

# calculate center of box if detection exceeds threshold

if len(objects) > 0 and scores[0][objects][0] > 0.15:

pyautogui.press('up')

objects = np.where(classes[0] == 2)[0]

# calculate center of box if detection exceeds threshold

if len(objects) > 0 and scores[0][objects][0] > 0.15:

pyautogui.press('down')

objects = np.where(classes[0] == 3)[0]

# calculate center of box if detection exceeds threshold

if len(objects) > 0 and scores[0][objects][0] > 0.15:

pyautogui.press('left')

objects = np.where(classes[0] == 4)[0]

# calculate center of box if detection exceeds threshold

if len(objects) > 0 and scores[0][objects][0] > 0.15:

pyautogui.press('right')

cap.release()

In the above code we need to specify path to our inference graph using ” PATH_TO_CKPT ” variable. Also we need to specify ” PATH_TO_LABELS ” variable with path of object-detection.pbtxt file. Then specify number of classes i.e. 4 in our case.

In the above script we have used ” pyautogui ” to press the button when particular hand gesture for a particular direction is detected.

Finally you can play snake game using your hand gestures. Let see some of the results.

Pretty well yeah. This is all for playing snake game using tensorflow object detection API. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Snake Game Using Tensorflow Object Detection API – Part III

1 Reply

In the previous blogs we have seen how to generate data for object detection and convert it into TFRecord format to train the model. In this blog we will learn how to use this data to train the model.

To train the model we will use the pre-trained model and then use transfer learning to train it on our dataset. I have used mobilenet pre trained model. Here is mobilenet model. For its configuration file you can go to model -> research -> object_detection -> samples -> configs ->> ssd_mobilenet_v1_pets.config.
The configuration file that we have downloaded, needs to be edited as per our requirement. In configuration file we have changed the no. of classes, no. of steps in training, path to model checkpoint and path to pbtxt files as shown below.

# SSD with Mobilenet v1, configured for Oxford-IIIT Pets Dataset.
# Users should configure the fine_tune_checkpoint field in the train config as
# well as the label_map_path and input_path fields in the train_input_reader and
# eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that
# should be configured.

model {
  ssd {
    num_classes: 4
    box_coder {
      faster_rcnn_box_coder {
        y_scale: 10.0
        x_scale: 10.0
        height_scale: 5.0
        width_scale: 5.0
      }
    }
    matcher {
      argmax_matcher {
        matched_threshold: 0.5
        unmatched_threshold: 0.5
        ignore_thresholds: false
        negatives_lower_than_unmatched: true
        force_match_for_each_row: true
      }
    }
    similarity_calculator {
      iou_similarity {
      }
    }
    anchor_generator {
      ssd_anchor_generator {
        num_layers: 6
        min_scale: 0.2
        max_scale: 0.95
        aspect_ratios: 1.0
        aspect_ratios: 2.0
        aspect_ratios: 0.5
        aspect_ratios: 3.0
        aspect_ratios: 0.3333
      }
    }
    image_resizer {
      fixed_shape_resizer {
        height: 300
        width: 300
      }
    }
    box_predictor {
      convolutional_box_predictor {
        min_depth: 0
        max_depth: 0
        num_layers_before_predictor: 0
        use_dropout: false
        dropout_keep_probability: 0.8
        kernel_size: 1
        box_code_size: 4
        apply_sigmoid_to_scores: false
        conv_hyperparams {
          activation: RELU_6,
          regularizer {
            l2_regularizer {
              weight: 0.00004
            }
          }
          initializer {
            truncated_normal_initializer {
              stddev: 0.03
              mean: 0.0
            }
          }
          batch_norm {
            train: true,
            scale: true,
            center: true,
            decay: 0.9997,
            epsilon: 0.001,
          }
        }
      }
    }
    feature_extractor {
      type: 'ssd_mobilenet_v1'
      min_depth: 16
      depth_multiplier: 1.0
      conv_hyperparams {
        activation: RELU_6,
        regularizer {
          l2_regularizer {
            weight: 0.00004
          }
        }
        initializer {
          truncated_normal_initializer {
            stddev: 0.03
            mean: 0.0
          }
        }
        batch_norm {
          train: true,
          scale: true,
          center: true,
          decay: 0.9997,
          epsilon: 0.001,
        }
      }
    }
    loss {
      classification_loss {
        weighted_sigmoid {
          anchorwise_output: true
        }
      }
      localization_loss {
        weighted_smooth_l1 {
          anchorwise_output: true
        }
      }
      hard_example_miner {
        num_hard_examples: 3000
        iou_threshold: 0.99
        loss_type: CLASSIFICATION
        max_negatives_per_positive: 3
        min_negatives_per_image: 0
      }
      classification_weight: 1.0
      localization_weight: 1.0
    }
    normalize_loss_by_num_matches: true
    post_processing {
      batch_non_max_suppression {
        score_threshold: 1e-8
        iou_threshold: 0.6
        max_detections_per_class: 100
        max_total_detections: 100
      }
      score_converter: SIGMOID
    }
  }
}

train_config: {
  batch_size: 16
  optimizer {
    rms_prop_optimizer: {
      learning_rate: {
        exponential_decay_learning_rate {
          initial_learning_rate: 0.004
          decay_steps: 800720
          decay_factor: 0.95
        }
      }
      momentum_optimizer_value: 0.9
      decay: 0.9
      epsilon: 1.0
    }
  }
  fine_tune_checkpoint: "D:/models/research/images/ssd_mobilenet_v1_coco_11_06_2017/model.ckpt"
  from_detection_checkpoint: true
  # Note: The below line limits the training process to 200K steps, which we
  # empirically found to be sufficient enough to train the pets dataset. This
  # effectively bypasses the learning rate schedule (the learning rate will
  # never decay). Remove the below line to train indefinitely.
  num_steps: 200000
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
  data_augmentation_options {
    ssd_random_crop {
    }
  }
}

train_input_reader: {
  tf_record_input_reader {
    input_path: "D:/models/research/images/data/train.record"
  }
  label_map_path: "D:/models/research/images/data/object-detection.pbtxt"
}

eval_config: {
  num_examples: 2000
  # Note: The below line limits the evaluation process to 10 evaluations.
  # Remove the below line to evaluate indefinitely.
  max_evals: 10
}

eval_input_reader: {
  tf_record_input_reader {
    input_path: "D:/models/research/images/data/test.record"
  }
  label_map_path: "D:/models/research/images/data/object-detection.pbtxt"
  shuffle: false
  num_readers: 1
}

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

# SSD with Mobilenet v1, configured for Oxford-IIIT Pets Dataset.

# Users should configure the fine_tune_checkpoint field in the train config as

# well as the label_map_path and input_path fields in the train_input_reader and

# eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that

# should be configured.

model {

ssd {

num_classes: 4

box_coder {

faster_rcnn_box_coder {

y_scale: 10.0

x_scale: 10.0

height_scale: 5.0

width_scale: 5.0

}

matcher {

argmax_matcher {

matched_threshold: 0.5

unmatched_threshold: 0.5

ignore_thresholds: false

negatives_lower_than_unmatched: true

force_match_for_each_row: true

}

similarity_calculator {

iou_similarity {

}

anchor_generator {

ssd_anchor_generator {

num_layers: 6

min_scale: 0.2

max_scale: 0.95

aspect_ratios: 1.0

aspect_ratios: 2.0

aspect_ratios: 0.5

aspect_ratios: 3.0

aspect_ratios: 0.3333

}

image_resizer {

fixed_shape_resizer {

height: 300

width: 300

}

box_predictor {

convolutional_box_predictor {

min_depth: 0

max_depth: 0

num_layers_before_predictor: 0

use_dropout: false

dropout_keep_probability: 0.8

kernel_size: 1

box_code_size: 4

apply_sigmoid_to_scores: false

conv_hyperparams {

activation: RELU_6,

regularizer {

l2_regularizer {

weight: 0.00004

}

initializer {

truncated_normal_initializer {

stddev: 0.03

mean: 0.0

}

batch_norm {

train: true,

scale: true,

center: true,

decay: 0.9997,

epsilon: 0.001,

}

feature_extractor {

type: 'ssd_mobilenet_v1'

min_depth: 16

depth_multiplier: 1.0

conv_hyperparams {

activation: RELU_6,

regularizer {

l2_regularizer {

weight: 0.00004

}

initializer {

truncated_normal_initializer {

stddev: 0.03

mean: 0.0

}

batch_norm {

train: true,

scale: true,

center: true,

decay: 0.9997,

epsilon: 0.001,

}

loss {

classification_loss {

weighted_sigmoid {

anchorwise_output: true

}

localization_loss {

weighted_smooth_l1 {

anchorwise_output: true

}

hard_example_miner {

num_hard_examples: 3000

iou_threshold: 0.99

loss_type: CLASSIFICATION

max_negatives_per_positive: 3

min_negatives_per_image: 0

}

classification_weight: 1.0

localization_weight: 1.0

}

normalize_loss_by_num_matches: true

post_processing {

batch_non_max_suppression {

score_threshold: 1e-8

iou_threshold: 0.6

max_detections_per_class: 100

max_total_detections: 100

}

score_converter: SIGMOID

}

train_config: {

batch_size: 16

optimizer {

rms_prop_optimizer: {

learning_rate: {

exponential_decay_learning_rate {

initial_learning_rate: 0.004

decay_steps: 800720

decay_factor: 0.95

}

momentum_optimizer_value: 0.9

decay: 0.9

epsilon: 1.0

}

fine_tune_checkpoint: "D:/models/research/images/ssd_mobilenet_v1_coco_11_06_2017/model.ckpt"

from_detection_checkpoint: true

# Note: The below line limits the training process to 200K steps, which we

# empirically found to be sufficient enough to train the pets dataset. This

# effectively bypasses the learning rate schedule (the learning rate will

# never decay). Remove the below line to train indefinitely.

num_steps: 200000

data_augmentation_options {

random_horizontal_flip {

}

data_augmentation_options {

ssd_random_crop {

}

train_input_reader: {

tf_record_input_reader {

input_path: "D:/models/research/images/data/train.record"

}

label_map_path: "D:/models/research/images/data/object-detection.pbtxt"

}

eval_config: {

num_examples: 2000

# Note: The below line limits the evaluation process to 10 evaluations.

# Remove the below line to evaluate indefinitely.

max_evals: 10

}

eval_input_reader: {

tf_record_input_reader {

input_path: "D:/models/research/images/data/test.record"

}

label_map_path: "D:/models/research/images/data/object-detection.pbtxt"

shuffle: false

num_readers: 1

}

For the object-detection.pbtxt file, create a pbtxt file and put following text inside it to specify our labels for the problem.

item {
  id: 1
  name: 'up'
}
item {
	id:2
	name:"down"
}
item {
	id:3
	name:"left"
}
item {
	id:4
	name:"right"
}

item {

id: 1

name: 'up'

}

item {

id:2

name:"down"

}

item {

id:3

name:"left"

}

item {

id:4

name:"right"

}

Now go to models -> research -> object detection -> legecy and copy train.py file to models -> research folder.

Then create a folder named images inside models -> research folder. Put your mobilenet model, configuration file, train and test image data folders, and train and test csv label files. Inside training_data folder, create a folder named data and put your train and test TFRecord files. The hierarchy will look like this:

images
   -data
      -object-detection.pbtxt
	  -test.record
	  -train.record
   -test
      -contains all test images data
   -train
      -contains all train images data
   -ssd_mobilenet_v1_pets.config
   -test_labels.csv
   -train_labels.csv
   -training

images

-data

-object-detection.pbtxt

-test.record

-train.record

-test

-contains all test images data

-train

-contains all train images data

-ssd_mobilenet_v1_pets.config

-test_labels.csv

-train_labels.csv

-training

Also create a training folder inside the images folder where model will save its checkpoints. Now run the following command to train the model from models -> research folder.

python train.py --logtostderr --train_dir=images/training/ --pipeline_config_path=images/ssd_mobilenet_v1_pets.config

1	python train.py --logtostderr --train_dir=images/training/ --pipeline_config_path=images/ssd_mobilenet_v1_pets.config

Time for training your model will depend upon your machine configuration and no. of steps that you have mentioned in the configuration file.

Now we have our trained model and its checkpoints are saved inside the models/research/images/training folder. In order to test this model and use this model to detect objects we need to export the inference graph.

To do this first we need to copy models/research/object_detection/export_inference_graph.py to models/research/ folder. Then inside models/research folder create a folder named “snake” which will save the inference graph. From models -> research folder run the following command:

python export_inference_graph.py --input_type image_tensor --pipeline_config_path images/ssd_mobilenet_v1_pets.config --trained_checkpoint_prefix images/training/model.ckpt-34345 --output_directory snake

1	python export_inference_graph.py --input_type image_tensor --pipeline_config_path images/ssd_mobilenet_v1_pets.config --trained_checkpoint_prefix images/training/model.ckpt-34345 --output_directory snake

Now we are having forzen_inference_graph.pb inside models/research/snake folder which will be used to detect object using trained model.

This is all for training the model and saving the inference graph, in the next blog we will see how to use this inference graph for object detection and how to run our snake game using this trained object detection model.

Next Blog: Snake Game Using Tensorflow Object Detection API – Part IV

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Snake Game Using Tensorflow Object Detection API – Part II

1 Reply

In the previous blog, we did two things. First, we create a dataset and second we split this into training and test. In this blog, we will learn how to convert this dataset into TFRecord format for training.

Before creating TFRecord file, we just need to do one more step. In the last blog, we have generated XML files using LabelImg. To get labels for training and test dataset, we need to convert these XML files into CSV format. To do this we will use the following code which has been taken from this repository.

import os
import glob
import pandas as pd
import xml.etree.ElementTree as ET


def xml_to_csv(path):
    xml_list = []
    for xml_file in glob.glob(path + '/*.xml'):
        tree = ET.parse(xml_file)
        root = tree.getroot()
        for member in root.findall('object'):
            value = (root.find('filename').text,
                     int(root.find('size')[0].text),
                     int(root.find('size')[1].text),
                     member[0].text,
                     int(member[4][0].text),
                     int(member[4][1].text),
                     int(member[4][2].text),
                     int(member[4][3].text)
                     )
            xml_list.append(value)
    column_name = ['filename', 'width', 'height', 'class', 'xmin', 'ymin', 'xmax', 'ymax']
    xml_df = pd.DataFrame(xml_list, columns=column_name)
    return xml_df


def main():
    for directory in ['train','test']:
        image_path = os.path.join(os.getcwd(), 'images/{}'.format(directory))
        xml_df = xml_to_csv(image_path)
        xml_df.to_csv('data/{}_labels.csv'.format(directory), index=None)
        print('Successfully converted xml to csv.')


main()

import os

import glob

import pandas as pd

import xml.etree.ElementTree as ET

def xml_to_csv(path):

xml_list = []

for xml_file in glob.glob(path + '/*.xml'):

tree = ET.parse(xml_file)

root = tree.getroot()

for member in root.findall('object'):

value = (root.find('filename').text,

int(root.find('size')[0].text),

int(root.find('size')[1].text),

member[0].text,

int(member[4][0].text),

int(member[4][1].text),

int(member[4][2].text),

int(member[4][3].text)

)

xml_list.append(value)

column_name = ['filename', 'width', 'height', 'class', 'xmin', 'ymin', 'xmax', 'ymax']

xml_df = pd.DataFrame(xml_list, columns=column_name)

return xml_df

def main():

for directory in ['train','test']:

image_path = os.path.join(os.getcwd(), 'images/{}'.format(directory))

xml_df = xml_to_csv(image_path)

xml_df.to_csv('data/{}_labels.csv'.format(directory), index=None)

print('Successfully converted xml to csv.')

main()

In the above main function, you should specify your XML files path for both train and test folder. The generated CSV files will contain columns as filename, width, and height of images, output label of images and coordinates of the annotated rectangular box as shown in the figure below

Once you have your train and test images with labels in CSV format, let’s convert data in TFRecord format.

A TFRecord file store your data as a sequence of binary strings. It has many advantages over normal data formats. To do this we will use the following code which has been taken from this repository. According to your requirement, you need to change the condition for labels at line 31 below.

"""
Usage:
  # From tensorflow/models/
  # Create train data:
  python generate_tfrecord.py --csv_input=data/train_labels.csv  --output_path=train.record

  # Create test data:
  python generate_tfrecord.py --csv_input=data/test_labels.csv  --output_path=test.record
"""
from __future__ import division
from __future__ import print_function
from __future__ import absolute_import

import os
import io
import pandas as pd
import tensorflow as tf

from PIL import Image
from object_detection.utils import dataset_util
from collections import namedtuple, OrderedDict

flags = tf.app.flags
flags.DEFINE_string('csv_input', '', 'Path to the CSV input')
flags.DEFINE_string('output_path', '', 'Path to output TFRecord')
flags.DEFINE_string('image_dir', '', 'Path to images')
FLAGS = flags.FLAGS


# TO-DO replace this with label map
def class_text_to_int(row_label):
    if row_label == 'up':
        return 1
    elif row_label == 'down':
        return 2
    elif row_label == 'left':
        return 3
    elif row_label == 'right':
        return 4
    else:
        None


def split(df, group):
    data = namedtuple('data', ['filename', 'object'])
    gb = df.groupby(group)
    return [data(filename, gb.get_group(x)) for filename, x in zip(gb.groups.keys(), gb.groups)]


def create_tf_example(group, path):
    with tf.gfile.GFile(os.path.join(path, '{}'.format(group.filename)), 'rb') as fid:
        encoded_jpg = fid.read()
    encoded_jpg_io = io.BytesIO(encoded_jpg)
    image = Image.open(encoded_jpg_io)
    width, height = image.size

    filename = group.filename.encode('utf8')
    image_format = b'jpg'
    xmins = []
    xmaxs = []
    ymins = []
    ymaxs = []
    classes_text = []
    classes = []

    for index, row in group.object.iterrows():
        xmins.append(row['xmin'] / width)
        xmaxs.append(row['xmax'] / width)
        ymins.append(row['ymin'] / height)
        ymaxs.append(row['ymax'] / height)
        classes_text.append(row['class'].encode('utf8'))
        classes.append(class_text_to_int(row['class']))

    tf_example = tf.train.Example(features=tf.train.Features(feature={
        'image/height': dataset_util.int64_feature(height),
        'image/width': dataset_util.int64_feature(width),
        'image/filename': dataset_util.bytes_feature(filename),
        'image/source_id': dataset_util.bytes_feature(filename),
        'image/encoded': dataset_util.bytes_feature(encoded_jpg),
        'image/format': dataset_util.bytes_feature(image_format),
        'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
        'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
        'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
        'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
        'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
        'image/object/class/label': dataset_util.int64_list_feature(classes),
    }))
    return tf_example


def main(_):
    writer = tf.python_io.TFRecordWriter(FLAGS.output_path)
    path = os.path.join(FLAGS.image_dir)
    examples = pd.read_csv(FLAGS.csv_input)
    grouped = split(examples, 'filename')
    for group in grouped:
        tf_example = create_tf_example(group, path)
        writer.write(tf_example.SerializeToString())

    writer.close()
    output_path = os.path.join(os.getcwd(), FLAGS.output_path)
    print('Successfully created the TFRecords: {}'.format(output_path))


if __name__ == '__main__':
    tf.app.run()

100

101

102

103

104

105

106

"""

Usage:

# From tensorflow/models/

# Create train data:

python generate_tfrecord.py --csv_input=data/train_labels.csv --output_path=train.record

# Create test data:

python generate_tfrecord.py --csv_input=data/test_labels.csv --output_path=test.record

"""

from __future__ import division

from __future__ import print_function

from __future__ import absolute_import

import os

import io

import pandas as pd

import tensorflow as tf

from PIL import Image

from object_detection.utils import dataset_util

from collections import namedtuple, OrderedDict

flags = tf.app.flags

flags.DEFINE_string('csv_input', '', 'Path to the CSV input')

flags.DEFINE_string('output_path', '', 'Path to output TFRecord')

flags.DEFINE_string('image_dir', '', 'Path to images')

FLAGS = flags.FLAGS

# TO-DO replace this with label map

def class_text_to_int(row_label):

if row_label == 'up':

return 1

elif row_label == 'down':

return 2

elif row_label == 'left':

return 3

elif row_label == 'right':

return 4

else:

None

def split(df, group):

data = namedtuple('data', ['filename', 'object'])

gb = df.groupby(group)

return [data(filename, gb.get_group(x)) for filename, x in zip(gb.groups.keys(), gb.groups)]

def create_tf_example(group, path):

with tf.gfile.GFile(os.path.join(path, '{}'.format(group.filename)), 'rb') as fid:

encoded_jpg = fid.read()

encoded_jpg_io = io.BytesIO(encoded_jpg)

image = Image.open(encoded_jpg_io)

width, height = image.size

filename = group.filename.encode('utf8')

image_format = b'jpg'

xmins = []

xmaxs = []

ymins = []

ymaxs = []

classes_text = []

classes = []

for index, row in group.object.iterrows():

xmins.append(row['xmin'] / width)

xmaxs.append(row['xmax'] / width)

ymins.append(row['ymin'] / height)

ymaxs.append(row['ymax'] / height)

classes_text.append(row['class'].encode('utf8'))

classes.append(class_text_to_int(row['class']))

tf_example = tf.train.Example(features=tf.train.Features(feature={

'image/height': dataset_util.int64_feature(height),

'image/width': dataset_util.int64_feature(width),

'image/filename': dataset_util.bytes_feature(filename),

'image/source_id': dataset_util.bytes_feature(filename),

'image/encoded': dataset_util.bytes_feature(encoded_jpg),

'image/format': dataset_util.bytes_feature(image_format),

'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),

'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),

'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),

'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),

'image/object/class/text': dataset_util.bytes_list_feature(classes_text),

'image/object/class/label': dataset_util.int64_list_feature(classes),

}))

return tf_example

def main(_):

writer = tf.python_io.TFRecordWriter(FLAGS.output_path)

path = os.path.join(FLAGS.image_dir)

examples = pd.read_csv(FLAGS.csv_input)

grouped = split(examples, 'filename')

for group in grouped:

tf_example = create_tf_example(group, path)

writer.write(tf_example.SerializeToString())

writer.close()

output_path = os.path.join(os.getcwd(), FLAGS.output_path)

print('Successfully created the TFRecords: {}'.format(output_path))

if __name__ == '__main__':

tf.app.run()

Save this code in a file named generate_tfrecord.py. Now in order to use this code, first we need to clone tensorflow object detection API. For that do the following:

git clone https://github.com/tensorflow/models.git

1	git clone https://github.com/tensorflow/models.git

Then we need to do the following steps to avoid getting error of protoc:

Go to this release link and download protobuf according to your operating system.
Extract the downloaded file and go to bin folder inside it.
Copy protoc.exe file and put in models -> research -> object_detection -> protos folder.
In protos folder run the following command for .proto files.

protoc string_int_label_map.proto --python_out=.

1	protoc string_int_label_map.proto --python_out=.

After cloning this repository, copy generate_tfrecord.py inside models -> research folder and run the following command.

python generate_tfrecord.py --csv_input=data/train_labels.csv  --output_path=train.record --image_dir data/train

python generate_tfrecord.py --csv_input=data/test_labels.csv  --output_path=train.record --image_dir data/test

python generate_tfrecord.py --csv_input=data/train_labels.csv --output_path=train.record --image_dir data/train

python generate_tfrecord.py --csv_input=data/test_labels.csv --output_path=train.record --image_dir data/test

Above commands will generate two files named train.record and test.record which will be used for training of model.

This is all for generating TFRecord file, in the next blog we will perform training and testing of object detection model.

Next Blog: Snake Game Using Tensorflow Object Detection API – Part III

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Snake Game Using Tensorflow Object Detection API

1 Reply

Here, we will learn how to use tensorflow object detection API with the computer’s webcam to play a snake game. We will use hand gestures instead of the keyboard.

We will use following steps to play snake game using tensorflow object detection API:

Generate dataset.
Convert train and test datasets into tfrecord format.
Train a pre-trained model using generated data.
Integrate trained model with snake game.
Play the snake game using your own hand gestures.

In this blog, we will cover only the first step i.e. how to create your own training data and the rest steps will be covered in the subsequent blogs.

You can find code here.

Generate Dataset

A snake game problem generally contains four directions to move i.e. up, down, right and left. For each of the four directions, we need to generate at least 100 images per direction. You can use your phone or laptop camera to do this. Try to generate images with a different background for better generalization. Below are some examples of images of hand gestures.

Now we have our captured images of hand gestures. The next thing is to annotate these images according to their classes. Which means we need to create rectangular boxes around hand gestures and label them appropriately. Don’t worry there is a tool named LabelImg, which is highly helpful to annotate images to create training and test dataset. To start with LabelImg you can follow there GitHub link. The start screen of Labellmg would look like this.

At the left side of the screen, you can find various options. Click on Open dir and choose the input image folder. Then click on Change save dir and select output folder where generated XML files will be saved. This XML file will contain coordinates of your generated rectangular box in the image, something like this.

<annotation>
	<folder>Screenshots</folder>
	<filename>labellmg.png</filename>
	<path>\Pictures\Screenshots\labellmg.png</path>
	<source>
		<database>Unknown</database>
	</source>
	<size>
		<width>1920</width>
		<height>1080</height>
		<depth>3</depth>
	</size>
	<segmented>0</segmented>
	<object>
		<name>left</name>
		<pose>Unspecified</pose>
		<truncated>0</truncated>
		<difficult>0</difficult>
		<bndbox>
			<xmin>277</xmin>
			<ymin>234</ymin>
			<xmax>814</xmax>
			<ymax>780</ymax>
		</bndbox>
	</object>
</annotation>

<folder>Screenshots</folder>

<filename>labellmg.png</filename>

<path>\Pictures\Screenshots\labellmg.png</path>

<database>Unknown</database>

</source>

<size>

</size>

<pose>Unspecified</pose>

</bndbox>

</object>

</annotation>

To create a rectangular box in the image using Labellmg, you just need to press ‘W’ and then create a box and save it. You can create one or multiple boxes in one image as shown in the figure below. Repeat this for all the images.

Now we have images and their corresponding XML files. Then we will separate this dataset into training and testing in the 90/10 ratio. To do this we need to put 90% of images of each class ‘up’, ‘right’, ‘left’ and ‘down’ and their corresponding XML files in one folder and other 10% in other folder.

That’s all for creating dataset, in the next blog we will see how to create TFRecord files from these datasets which will be used for training of the model.

Next Blog: Snake Game Using Tensorflow Object Detection API – Part II

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Snake Game using Real Time Speech Recognition

2 Replies

Speech recognition can be very helpful in your daily activities like you can switch on and off your laptops, control your T.V and A.C., and handle other home appliances. In this blog, we will learn a fun activity to play a snake game using voice control. By learning this method you can apply it on other real life applications.

To perform speech recognition, you need to train a model. And training a model requires a large amount of data which requires a lot of time. To save this time I was searching for a pre-trained model. There are some open source api, but those are not that much accurate and fast. Luckily, I got to know about Porcupine. Porcupine is a self-service, highly-accurate, and lightweight wake word (voice control) engine. In this blog, I will show you how to play a snake game using porcupine GitHub repository.

To play a snake game, You first need to develop a snake game. But don’t worry you need not to develop it from scratch, you can clone this repository. And if you want to know algorithm behind this code you can follow this blog.

Now you have a snake game, the next thing is how to use speech recognition to play this game. First, clone the Porcupine repository into your system. It has some pre-trained wake words define in it. You can also use your own wake words. For this problem, I have used four wake words “go left”, “go right”, “go down” and “snake up”.

Here are the steps to play snake game using voice control:

First go to Porcupine directory that you have cloned.
Then go to tools -> optimizer -> System(windows or linux or mac) -> os type(64 ar 32 bit)
Then use pv_porcupine_optimizer.exe file to create wake word files. To do this you need following command

 pv_porcupine_optimizer.exe -r D:/Porcupine/resources/ -w "wake word" -o D:/Porcupine -p windows

1	pv_porcupine_optimizer.exe -r D:/Porcupine/resources/ -w "wake word" -o D:/Porcupine -p windows

Here -r corresponds to resource directory which you can find inside Porcupine directory, -w corresponds to your wake word that you can choose, -o corresponds to the output directory of your wake word and -p corresponds to your platform(windows, linux or mac)
To generate four different wake words, I have run the above command four times.
Now you have created your wake words, the next thing is to integrate it with your snake game python code.
Inside Porcupine folder go to binding -> python. There is a file named porcupine.py
Open porcupine. py file and append following code at the last of it. You will also be needed to install pyaudio and pyautogui using pip.

import pyaudio
import struct
import pyautogui  # to press a button to play the game

# below are the four wake word's path that you have generated earlier
key1 = 'go_down_windows.ppn'
key2 = 'snake_up_windows.ppn'
key3 = 'go_right_windows.ppn'
key4 = 'go_left_windows.ppn'

# this is the library path that you can fnd inside Porcupine -> lib -> system(windows or linux or mac) -> os type( 64 or 32 bit)
library_path = '/Porcupine/lib/windows/amd64/libpv_porcupine.dll' 

# this is model file path can be find inside Porcupine -> lib -> common
model_file_path = '/Porcupine/lib/common/porcupine_params.pv'
keyword_file_paths = [key1, key2, key3, key4]
sensitivities = [0.5,0.5,0.5,0.5]
handle = Porcupine(library_path, model_file_path, keyword_file_paths=keyword_file_paths, sensitivities=sensitivities)

def get_next_audio_frame():
    pa = pyaudio.PyAudio()
    audio_stream = pa.open(rate=handle.sample_rate,channels=1,format=pyaudio.paInt16,input=True,frames_per_buffer=handle.frame_length,input_device_index=None)
    pcm = audio_stream.read(handle.frame_length)
    pcm = struct.unpack_from("h" * handle.frame_length, pcm)
    return pcm

while True:
    pcm = get_next_audio_frame()
    keyword_index = handle.process(pcm)
    if keyword_index==1:
        print(keyword_index)
        pyautogui.press('up')
    if keyword_index==3:
        print(keyword_index)
        pyautogui.press('left')
    if keyword_index==2:
        print(keyword_index)
        pyautogui.press('right')
    if keyword_index==0:
        print(keyword_index)
        pyautogui.press('down')

import pyaudio

import struct

import pyautogui # to press a button to play the game

# below are the four wake word's path that you have generated earlier

key1 = 'go_down_windows.ppn'

key2 = 'snake_up_windows.ppn'

key3 = 'go_right_windows.ppn'

key4 = 'go_left_windows.ppn'

# this is the library path that you can fnd inside Porcupine -> lib -> system(windows or linux or mac) -> os type( 64 or 32 bit)

library_path = '/Porcupine/lib/windows/amd64/libpv_porcupine.dll'

# this is model file path can be find inside Porcupine -> lib -> common

model_file_path = '/Porcupine/lib/common/porcupine_params.pv'

keyword_file_paths = [key1, key2, key3, key4]

sensitivities = [0.5,0.5,0.5,0.5]

handle = Porcupine(library_path, model_file_path, keyword_file_paths=keyword_file_paths, sensitivities=sensitivities)

def get_next_audio_frame():

pa = pyaudio.PyAudio()

audio_stream = pa.open(rate=handle.sample_rate,channels=1,format=pyaudio.paInt16,input=True,frames_per_buffer=handle.frame_length,input_device_index=None)

pcm = audio_stream.read(handle.frame_length)

pcm = struct.unpack_from("h" * handle.frame_length, pcm)

return pcm

while True:

pcm = get_next_audio_frame()

keyword_index = handle.process(pcm)

if keyword_index==1:

print(keyword_index)

pyautogui.press('up')

if keyword_index==3:

print(keyword_index)

pyautogui.press('left')

if keyword_index==2:

print(keyword_index)

pyautogui.press('right')

if keyword_index==0:

print(keyword_index)

pyautogui.press('down')

Now run both porcupine.py file and your snake game code to play it with your voice control.

Now you have got an idea to use real time speech recognition. Hope you can find some real life applications to apply it.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

2D Histogram

1 Reply

In the last blog, we discussed 1-D histograms in which we analyze each channel separately. Suppose, we want to find the correlation between image channels, let’s say we are interested in finding like how many times a (red, green) pair of (100,56) appeared in an image. In such a case, a 1-D histogram will fail as it does not shows the relationship of intensities at the exact position between two channels.

To solve this problem, we need Multi-dimensional histograms like 2-D or 3D. With the help of 2-D histograms, we can analyze the channels together in groups of 2 (RG, GB, BR) or all together with 3D histograms. Let’s see what is a 2-D histogram and how to construct this using OpenCV Python.

A 2-D histogram counts the occurrence of combinations of intensities. Below figure shows a 2D histogram

Here, Y and X-axis correspond to the Red and Green channel ranges( for 8-bit, [0,255]) and each point within the histogram shows the frequency corresponding to each R and G pair. Frequency is color-coded here, otherwise, another dimension would be needed.

Let’s understand how to construct a 2-D histogram by taking a simple example.

Suppose, we have 4×4, 2-bit images of Red and Green channels(as shown below) and we want to plot their 2-D histogram.

First, we plot the R and G channel ranges(Here, [0,3]) on the X and Y-axis respectively. This will be our 2-D histogram.

Then, loop over each position within the channels, find the corresponding intensity pairs frequency and plot it in the 2-D histogram. These frequencies are then color-coded for ease of visualization.

Now, let’s see how to construct a 2-D histogram using OpenCV-Python

We use the same function cv2.calcHist() that we have used for a 1-D histogram. Just change the following parameters and rest is the same.

channels: [0,1] for (Blue, Green), [1,2] for (G, R) and [0,2] for (B, R).
bins: specify for each channel according to your need. e.g [256,256].
range: [0,256,0,256] for an 8-bit image.

Below is the sample code for this using OpenCV-Python

# 2D histogram for Blue and Green channels.
hist = cv2.calcHist([image], [0, 1], None, [256, 256], [0, 256, 0, 256])
# show using matplotlib
plt.imshow(hist, interpolation = 'nearest')
plt.show()

# 2D histogram for Blue and Green channels.

hist = cv2.calcHist([image], [0, 1], None, [256, 256], [0, 256, 0, 256])

# show using matplotlib

plt.imshow(hist, interpolation = 'nearest')

plt.show()

Always use Nearest Neighbour Interpolation when plotting a 2-D histogram.

Plotting a 2-D histogram using RGB channels is not a good choice as we cannot extract color information using 2 channels only. Still, this can be used for finding the correlation between channels, finding clipping or intensity proportions etc.

To extract color information, we need a color model in which two components/channels can solely represent the chromaticity (color) of the image. One such color model is HSV where H and S tell us about the color of the light. So, first convert the image from BGR to HSV and then apply the above code.

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Understanding Image Histograms

7 Replies

In this blog, we will discuss image histogram which is a must-have tool in your pocket. This will help in contrast enhancement, image segmentation, image compression, thresholding etc. Let’s see what is an image histogram and how to plot histogram using OpenCV and matplotlib.

What is an Image Histogram?

An image histogram tells us how the intensity values are distributed in an image. In this we plot the intensity values on the x-axis and the no. of pixels corresponding to intensity values on the y-axis. See the figure below.

This is called 1D histogram because we are taking only one feature into our consideration, i.e. greyscale intensity value of the pixel. In the next blog, we will discuss 2D histograms.

Now, let’s understand some terminologies associated with histogram

Tonal range refers to the region where most of the intensity values are present (See above figure). The left side represents the black and dark areas known as shadows, the middle represents medium grey or midtones and the right side represents light and pure white areas known as Highlights.

So, for a dark image the histogram will cover mostly the left side and center of the graph. While for a bright image, the histogram mostly rests on the right side and center of the graph as shown in the figure below

Now, let’s see how to plot the histogram for an image using OpenCV and matplotlib.

OpenCV: To calculate the image histogram, OpenCV provides the following function

cv2.calcHist(image, channel, mask, bins, range)

image : input image, should be passed in a list. e.g. [image]
channel : index of the channel. for greyscale pass as [0], and for color image pass the desired channel as [0], [1], [2].
mask : provide if you want to calculate histogram for specific region otherwise pass None.
bins : No. of bins to use for each channel, should be passed as [256]
range : range of intensity values. For 8-bit pass as [0,256]

This returns a numpy.ndarray with shape (n_bins,1) which can then be plotted using matplotlib. Below is the code for this

import cv2
import matplotlib.pyplot as plt

# Load the image
image = cv2.imread('hist.jpg',0)

# Calculate histogram using cv2.calcHist()
hist = cv2.calcHist([image], [0], None, [256], [0,256])
# Display the histogram
plt.plot(hist)

import cv2

import matplotlib.pyplot as plt

# Load the image

image = cv2.imread('hist.jpg',0)

# Calculate histogram using cv2.calcHist()

hist = cv2.calcHist([image], [0], None, [256], [0,256])

# Display the histogram

plt.plot(hist)

Matplotlib: Unlike OpenCV, matplotlib directly finds the histogram and plots it using plt.hist()

plt.hist(image.flatten(), 256, [0,256])
plt.show()

1 2	plt.hist(image.flatten(), 256, [0,256]) plt.show()

For a color image, we can show each channel individually or we can first convert it into greyscale and then calculate the histogram. So, a color histogram can be expressed as “Three Intensity(Greyscale) Histograms”, each of which shows the brightness distribution of each individual Red/Green/Blue color channel. Below figure summarizes this.

So, always see the histogram of the image before doing any other pre-processing operation. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

DensePose

1. Collection of Dataset

Till now, no manually collected dataset exists, for dense correspondence of real images. Authors, have introduced a COCO-DensePose Dataset, with annotation of 50K humans which is having more than 5 million manually annotated correspondences.

In this task, human annotators are involved to annotate 2D image to 3D surfaces. If it was done by directly annotating it to 3D surface model, it would be cumbersome and very frustrating for annotators. So, Authors have acquired a two stage annotation pipeline and post measures accuracy of human annotators..

In the first stage, authors have delineated visible body parts like head, torso, leg, arms, hands and feet. And then designed these parts to isomorphic to a plane.

To simplify the annotation, authors have divided full body into 24 parts by flattening out the body s shown below.

In the second stage, authors have used k-means to sample maximum of 14 points on each part. Also to simplify this task, annotators are being provided with six pre-renderd views of same body part and asked to annotate in most suitable view of part. Surface coordinates of this annotated point will be used to mark on remaining views. See figure below:

Accuracy of Human Annotators

Here, annotators accuracy is measured over synthetic data. To calculate the accuracy, authors have compared the geodesic distance between true position generate by synthetic data and the one estimated by annotators to bring the synthesized image into correspondence with 3D surface. Geodesic distance is the distance between two vertices with the shortest path. Authors considered two types of evaluation measures to evaluate annotators accuracy.

Pointwise Evaluation: In this approach, geodesic distance is used as threshold for deciding ratio of correct points. With varying threshold, obtained a curve f(t) whose area under curve gives the summary of correspondence accuracy.

Per-instance Evaluation: For this type of evaluation, authors have introduced a geodesic point similarity formula.

With the above formula, GPS is calculated of every person instance on the image. And once this GPS matching score is calculated, they perform average precision and average recall with the GPS thresholding range b/w 0.5 to 0.95.

After performing these evaluations, it is found that annotation errors are greater at back and front part of torso and lesser at head and hand parts.

2. Using CNN based architectures on Generated Dataset

After generating ground truth dataset, it is time to train the deep neural network to predict the dense correspondences. Hence, authors have experimented with both fully convolution network and region based network( like Mask-RCNN), and found latter superior. Authors have combined DenseReg architecture with Mask-RCNN and introduced DensePose-RCNN.

Fully Convolution Network:

In this method they combined a classification and a regression task. In classification task, pixel is classified as either background or one of several body parts. Since we have divided full body into 24 parts, so classification will be 25 class( one background) classification.

Here c* is class with the highest probability in the classification task. After classifying pixel to which class it belongs, then it will do the regression to find U, V parameterization for its exact point in the 3D surface model. Regression is divided into 24 different regression task because each of 24 body parts are treated independently with their local coordinates.

Region Based Network:

After experimenting with fully convolution network, authors have found that it not as fruitful as region based network. For region based networks, they have used exact same architecture of MASK-RCNN till ROIAlign and then used fully convolution network for regression and classification same as DenseReg. This architecture is capable to work at 25 fps for 320X240 images and at 5 fps for 800×1100 images.

3. Ground-truth Interpolation using In-painting Network

During annotation, in every training sample annotators have annotated only a sparse set of pixel around 100-150. This does not hamper training because at the time of calculation of per-pixel loss, they have not included those pixels which are not annotated. But they found improved performance if they interpolated annotations of other pixel with the help of annotated pixels as shown below:

To do this , authors have trained a teacher network. They only focused on interpolation points on humans and ignored the background to reduce the background error.

Things to be noticed through this paper is a large scale dataset of ground-truth images for image to surface correspondence and combination of MASK R-CNN with DenseReg architecture to predict surface correspondences. This paper can pave a way to further reasearch and development in this field and can be boon to 3D modelling field.

Referenced Research paper: DensePose: Dense Human Pose Estimation in the Wild

Github : DensePose

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

TheAILearner

Mastering Artificial Intelligence

Author Archives: kang & atul

Creating a Snake Game using OpenCV-Python

Import Libraries

Displaying Game Objects

Game Rules

Displaying the final Score

Snake Game Using Python Curses

Snake Game Using Tensorflow Object Detection API – Part IV

Snake Game Using Tensorflow Object Detection API – Part III

Snake Game Using Tensorflow Object Detection API – Part II

Snake Game Using Tensorflow Object Detection API

Generate Dataset

Snake Game using Real Time Speech Recognition

2D Histogram

Understanding Image Histograms

DensePose

1. Collection of Dataset

2. Using CNN based architectures on Generated Dataset

3. Ground-truth Interpolation using In-painting Network