Here, we will learn how to use tensorflow object detection API with the computer’s webcam to play a snake game. We will use hand gestures instead of the keyboard.
We will use following steps to play snake game using tensorflow object detection API:
- Generate dataset.
- Convert train and test datasets into tfrecord format.
- Train a pre-trained model using generated data.
- Integrate trained model with snake game.
- Play the snake game using your own hand gestures.
In this blog, we will cover only the first step i.e. how to create your own training data and the rest steps will be covered in the subsequent blogs.
Generate Dataset
A snake game problem generally contains four directions to move i.e. up, down, right and left. For each of the four directions, we need to generate at least 100 images per direction. You can use your phone or laptop camera to do this. Try to generate images with a different background for better generalization. Below are some examples of images of hand gestures.
Now we have our captured images of hand gestures. The next thing is to annotate these images according to their classes. Which means we need to create rectangular boxes around hand gestures and label them appropriately. Don’t worry there is a tool named LabelImg, which is highly helpful to annotate images to create training and test dataset. To start with LabelImg you can follow there GitHub link. The start screen of Labellmg would look like this.
At the left side of the screen, you can find various options. Click on Open dir and choose the input image folder. Then click on Change save dir and select output folder where generated XML files will be saved. This XML file will contain coordinates of your generated rectangular box in the image, something like this.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
<annotation> <folder>Screenshots</folder> <filename>labellmg.png</filename> <path>\Pictures\Screenshots\labellmg.png</path> <source> <database>Unknown</database> </source> <size> <width>1920</width> <height>1080</height> <depth>3</depth> </size> <segmented>0</segmented> <object> <name>left</name> <pose>Unspecified</pose> <truncated>0</truncated> <difficult>0</difficult> <bndbox> <xmin>277</xmin> <ymin>234</ymin> <xmax>814</xmax> <ymax>780</ymax> </bndbox> </object> </annotation> |
To create a rectangular box in the image using Labellmg, you just need to press ‘W’ and then create a box and save it. You can create one or multiple boxes in one image as shown in the figure below. Repeat this for all the images.
Now we have images and their corresponding XML files. Then we will separate this dataset into training and testing in the 90/10 ratio. To do this we need to put 90% of images of each class ‘up’, ‘right’, ‘left’ and ‘down’ and their corresponding XML files in one folder and other 10% in other folder.
That’s all for creating
Next Blog: Snake Game Using Tensorflow Object Detection API – Part II
Hope you enjoy reading.
If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.
Do you have the dataset? I searched a lot for a dataset with bounding box over the hand gestures, but I failed to find one.
If you can share it, Let me know [email protected]