Tag Archives: labelimg

Optical Character Recognition Pipeline: Generating Dataset

The first step to create any deep learning model is to generate the dataset. In continuation of our optical character recognition pipeline, in this blog, we will see how we can get our training and test data.

In our OCR pipeline first, we need to get data for both segmentation and recognition(text). For the segmentation part, data will consist of images and corresponding files containing coordinates for words present in the image. Let’s see an example.

For recognition part data will consist of images and their corresponding text files. Here segmented images will contain a single word.

Open Source Dataset:

There are some open source dataset available for our pipeline. For the segmentation part here are some useful open source datasets.

Now let’s see some of the open source dataset for text recognition(images and their corresponding texts)

Synthetic Data:

In some cases, training your OCR model with synthetic data can also be useful. You can create your own synthetic data using some python script. You can also add some geometric transformation to simulate the real world distortion into the data. For an example here is a script to generate synthetic data for text recognition:

import random
import string
import PIL
from PIL import ImageFont
from PIL import Image
from PIL import ImageDraw
from tqdm import tqdm

# get a list of characters to be used in creating dataset
char_list = []
for char in string.ascii_letters:
    char_list.append(char)
    
# get font list
font_lst = ['arial', 'arialbd', 'times', 'timesbd', 'timesi','ariblk', 'arialbd', 'arialbi', 'ariali', 'timesbi']  

# generate images for each fonts
for fonts in font_lst:
    for i in tqdm(range(1)):
        for i in range(len(char_list)):
            word_size = random.randrange(0,10)
            char_list_copy = char_list.copy()
            char_list_copy.remove(char_list[i])
            new_word = char_list[i]
            for _ in range(word_size):
                new_word +=random.choice(char_list_copy)

            font = ImageFont.truetype(fonts+".ttf",14)
            img=Image.new("RGBA", (100,20),(255,255,255))
            draw = ImageDraw.Draw(img)
            draw.text((0, 0),new_word,(0,0,0),font=font)
            draw = ImageDraw.Draw(img)
            img.save('english/'+new_word+".png")

            txt_file = open('english/'+new_word+'.txt', 'w', encoding = 'utf8')
            txt_file.write(new_word)

import random

import string

import PIL

from PIL import ImageFont

from PIL import Image

from PIL import ImageDraw

from tqdm import tqdm

# get a list of characters to be used in creating dataset

char_list = []

for char in string.ascii_letters:

char_list.append(char)

# get font list

font_lst = ['arial', 'arialbd', 'times', 'timesbd', 'timesi','ariblk', 'arialbd', 'arialbi', 'ariali', 'timesbi']

# generate images for each fonts

for fonts in font_lst:

for i in tqdm(range(1)):

for i in range(len(char_list)):

word_size = random.randrange(0,10)

char_list_copy = char_list.copy()

char_list_copy.remove(char_list[i])

new_word = char_list[i]

for _ in range(word_size):

new_word +=random.choice(char_list_copy)

font = ImageFont.truetype(fonts+".ttf",14)

img=Image.new("RGBA", (100,20),(255,255,255))

draw = ImageDraw.Draw(img)

draw.text((0, 0),new_word,(0,0,0),font=font)

draw = ImageDraw.Draw(img)

img.save('english/'+new_word+".png")

txt_file = open('english/'+new_word+'.txt', 'w', encoding = 'utf8')

txt_file.write(new_word)

In the above code, I have generated English words images and corresponding text files using different font types with a font size of 14. Segmented images will look like below:

Five Segmented Images generated from above code

Annotation Tools and Manual Data:

Another way to create segmentation text dataset is by using annotation tools. In this case, you need to collect images manually or you can get images from the internet, then you need to manually annotate text in the images (Bounding Boxes). Annotation tools like labelimg can work in this case.

That’s all to generate the dataset. In the next blog, we will see image preprocessing steps to apply to these datasets. Hope you enjoy reading.

Next Blog: Optical Character Recognition Pipeline: Image Preprocessing

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Snake Game Using Tensorflow Object Detection API – Part II

1 Reply

In the previous blog, we did two things. First, we create a dataset and second we split this into training and test. In this blog, we will learn how to convert this dataset into TFRecord format for training.

Before creating TFRecord file, we just need to do one more step. In the last blog, we have generated XML files using LabelImg. To get labels for training and test dataset, we need to convert these XML files into CSV format. To do this we will use the following code which has been taken from this repository.

import os
import glob
import pandas as pd
import xml.etree.ElementTree as ET


def xml_to_csv(path):
    xml_list = []
    for xml_file in glob.glob(path + '/*.xml'):
        tree = ET.parse(xml_file)
        root = tree.getroot()
        for member in root.findall('object'):
            value = (root.find('filename').text,
                     int(root.find('size')[0].text),
                     int(root.find('size')[1].text),
                     member[0].text,
                     int(member[4][0].text),
                     int(member[4][1].text),
                     int(member[4][2].text),
                     int(member[4][3].text)
                     )
            xml_list.append(value)
    column_name = ['filename', 'width', 'height', 'class', 'xmin', 'ymin', 'xmax', 'ymax']
    xml_df = pd.DataFrame(xml_list, columns=column_name)
    return xml_df


def main():
    for directory in ['train','test']:
        image_path = os.path.join(os.getcwd(), 'images/{}'.format(directory))
        xml_df = xml_to_csv(image_path)
        xml_df.to_csv('data/{}_labels.csv'.format(directory), index=None)
        print('Successfully converted xml to csv.')


main()

import os

import glob

import pandas as pd

import xml.etree.ElementTree as ET

def xml_to_csv(path):

xml_list = []

for xml_file in glob.glob(path + '/*.xml'):

tree = ET.parse(xml_file)

root = tree.getroot()

for member in root.findall('object'):

value = (root.find('filename').text,

int(root.find('size')[0].text),

int(root.find('size')[1].text),

member[0].text,

int(member[4][0].text),

int(member[4][1].text),

int(member[4][2].text),

int(member[4][3].text)

)

xml_list.append(value)

column_name = ['filename', 'width', 'height', 'class', 'xmin', 'ymin', 'xmax', 'ymax']

xml_df = pd.DataFrame(xml_list, columns=column_name)

return xml_df

def main():

for directory in ['train','test']:

image_path = os.path.join(os.getcwd(), 'images/{}'.format(directory))

xml_df = xml_to_csv(image_path)

xml_df.to_csv('data/{}_labels.csv'.format(directory), index=None)

print('Successfully converted xml to csv.')

main()

In the above main function, you should specify your XML files path for both train and test folder. The generated CSV files will contain columns as filename, width, and height of images, output label of images and coordinates of the annotated rectangular box as shown in the figure below

Once you have your train and test images with labels in CSV format, let’s convert data in TFRecord format.

A TFRecord file store your data as a sequence of binary strings. It has many advantages over normal data formats. To do this we will use the following code which has been taken from this repository. According to your requirement, you need to change the condition for labels at line 31 below.

"""
Usage:
  # From tensorflow/models/
  # Create train data:
  python generate_tfrecord.py --csv_input=data/train_labels.csv  --output_path=train.record

  # Create test data:
  python generate_tfrecord.py --csv_input=data/test_labels.csv  --output_path=test.record
"""
from __future__ import division
from __future__ import print_function
from __future__ import absolute_import

import os
import io
import pandas as pd
import tensorflow as tf

from PIL import Image
from object_detection.utils import dataset_util
from collections import namedtuple, OrderedDict

flags = tf.app.flags
flags.DEFINE_string('csv_input', '', 'Path to the CSV input')
flags.DEFINE_string('output_path', '', 'Path to output TFRecord')
flags.DEFINE_string('image_dir', '', 'Path to images')
FLAGS = flags.FLAGS


# TO-DO replace this with label map
def class_text_to_int(row_label):
    if row_label == 'up':
        return 1
    elif row_label == 'down':
        return 2
    elif row_label == 'left':
        return 3
    elif row_label == 'right':
        return 4
    else:
        None


def split(df, group):
    data = namedtuple('data', ['filename', 'object'])
    gb = df.groupby(group)
    return [data(filename, gb.get_group(x)) for filename, x in zip(gb.groups.keys(), gb.groups)]


def create_tf_example(group, path):
    with tf.gfile.GFile(os.path.join(path, '{}'.format(group.filename)), 'rb') as fid:
        encoded_jpg = fid.read()
    encoded_jpg_io = io.BytesIO(encoded_jpg)
    image = Image.open(encoded_jpg_io)
    width, height = image.size

    filename = group.filename.encode('utf8')
    image_format = b'jpg'
    xmins = []
    xmaxs = []
    ymins = []
    ymaxs = []
    classes_text = []
    classes = []

    for index, row in group.object.iterrows():
        xmins.append(row['xmin'] / width)
        xmaxs.append(row['xmax'] / width)
        ymins.append(row['ymin'] / height)
        ymaxs.append(row['ymax'] / height)
        classes_text.append(row['class'].encode('utf8'))
        classes.append(class_text_to_int(row['class']))

    tf_example = tf.train.Example(features=tf.train.Features(feature={
        'image/height': dataset_util.int64_feature(height),
        'image/width': dataset_util.int64_feature(width),
        'image/filename': dataset_util.bytes_feature(filename),
        'image/source_id': dataset_util.bytes_feature(filename),
        'image/encoded': dataset_util.bytes_feature(encoded_jpg),
        'image/format': dataset_util.bytes_feature(image_format),
        'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
        'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
        'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
        'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
        'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
        'image/object/class/label': dataset_util.int64_list_feature(classes),
    }))
    return tf_example


def main(_):
    writer = tf.python_io.TFRecordWriter(FLAGS.output_path)
    path = os.path.join(FLAGS.image_dir)
    examples = pd.read_csv(FLAGS.csv_input)
    grouped = split(examples, 'filename')
    for group in grouped:
        tf_example = create_tf_example(group, path)
        writer.write(tf_example.SerializeToString())

    writer.close()
    output_path = os.path.join(os.getcwd(), FLAGS.output_path)
    print('Successfully created the TFRecords: {}'.format(output_path))


if __name__ == '__main__':
    tf.app.run()

100

101

102

103

104

105

106

"""

Usage:

# From tensorflow/models/

# Create train data:

python generate_tfrecord.py --csv_input=data/train_labels.csv --output_path=train.record

# Create test data:

python generate_tfrecord.py --csv_input=data/test_labels.csv --output_path=test.record

"""

from __future__ import division

from __future__ import print_function

from __future__ import absolute_import

import os

import io

import pandas as pd

import tensorflow as tf

from PIL import Image

from object_detection.utils import dataset_util

from collections import namedtuple, OrderedDict

flags = tf.app.flags

flags.DEFINE_string('csv_input', '', 'Path to the CSV input')

flags.DEFINE_string('output_path', '', 'Path to output TFRecord')

flags.DEFINE_string('image_dir', '', 'Path to images')

FLAGS = flags.FLAGS

# TO-DO replace this with label map

def class_text_to_int(row_label):

if row_label == 'up':

return 1

elif row_label == 'down':

return 2

elif row_label == 'left':

return 3

elif row_label == 'right':

return 4

else:

None

def split(df, group):

data = namedtuple('data', ['filename', 'object'])

gb = df.groupby(group)

return [data(filename, gb.get_group(x)) for filename, x in zip(gb.groups.keys(), gb.groups)]

def create_tf_example(group, path):

with tf.gfile.GFile(os.path.join(path, '{}'.format(group.filename)), 'rb') as fid:

encoded_jpg = fid.read()

encoded_jpg_io = io.BytesIO(encoded_jpg)

image = Image.open(encoded_jpg_io)

width, height = image.size

filename = group.filename.encode('utf8')

image_format = b'jpg'

xmins = []

xmaxs = []

ymins = []

ymaxs = []

classes_text = []

classes = []

for index, row in group.object.iterrows():

xmins.append(row['xmin'] / width)

xmaxs.append(row['xmax'] / width)

ymins.append(row['ymin'] / height)

ymaxs.append(row['ymax'] / height)

classes_text.append(row['class'].encode('utf8'))

classes.append(class_text_to_int(row['class']))

tf_example = tf.train.Example(features=tf.train.Features(feature={

'image/height': dataset_util.int64_feature(height),

'image/width': dataset_util.int64_feature(width),

'image/filename': dataset_util.bytes_feature(filename),

'image/source_id': dataset_util.bytes_feature(filename),

'image/encoded': dataset_util.bytes_feature(encoded_jpg),

'image/format': dataset_util.bytes_feature(image_format),

'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),

'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),

'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),

'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),

'image/object/class/text': dataset_util.bytes_list_feature(classes_text),

'image/object/class/label': dataset_util.int64_list_feature(classes),

}))

return tf_example

def main(_):

writer = tf.python_io.TFRecordWriter(FLAGS.output_path)

path = os.path.join(FLAGS.image_dir)

examples = pd.read_csv(FLAGS.csv_input)

grouped = split(examples, 'filename')

for group in grouped:

tf_example = create_tf_example(group, path)

writer.write(tf_example.SerializeToString())

writer.close()

output_path = os.path.join(os.getcwd(), FLAGS.output_path)

print('Successfully created the TFRecords: {}'.format(output_path))

if __name__ == '__main__':

tf.app.run()

Save this code in a file named generate_tfrecord.py. Now in order to use this code, first we need to clone tensorflow object detection API. For that do the following:

git clone https://github.com/tensorflow/models.git

1	git clone https://github.com/tensorflow/models.git

Then we need to do the following steps to avoid getting error of protoc:

Go to this release link and download protobuf according to your operating system.
Extract the downloaded file and go to bin folder inside it.
Copy protoc.exe file and put in models -> research -> object_detection -> protos folder.
In protos folder run the following command for .proto files.

protoc string_int_label_map.proto --python_out=.

1	protoc string_int_label_map.proto --python_out=.

After cloning this repository, copy generate_tfrecord.py inside models -> research folder and run the following command.

python generate_tfrecord.py --csv_input=data/train_labels.csv  --output_path=train.record --image_dir data/train

python generate_tfrecord.py --csv_input=data/test_labels.csv  --output_path=train.record --image_dir data/test

python generate_tfrecord.py --csv_input=data/train_labels.csv --output_path=train.record --image_dir data/train

python generate_tfrecord.py --csv_input=data/test_labels.csv --output_path=train.record --image_dir data/test

Above commands will generate two files named train.record and test.record which will be used for training of model.

This is all for generating TFRecord file, in the next blog we will perform training and testing of object detection model.

Next Blog: Snake Game Using Tensorflow Object Detection API – Part III

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Snake Game Using Tensorflow Object Detection API

1 Reply

Here, we will learn how to use tensorflow object detection API with the computer’s webcam to play a snake game. We will use hand gestures instead of the keyboard.

We will use following steps to play snake game using tensorflow object detection API:

Generate dataset.
Convert train and test datasets into tfrecord format.
Train a pre-trained model using generated data.
Integrate trained model with snake game.
Play the snake game using your own hand gestures.

In this blog, we will cover only the first step i.e. how to create your own training data and the rest steps will be covered in the subsequent blogs.

You can find code here.

Generate Dataset

A snake game problem generally contains four directions to move i.e. up, down, right and left. For each of the four directions, we need to generate at least 100 images per direction. You can use your phone or laptop camera to do this. Try to generate images with a different background for better generalization. Below are some examples of images of hand gestures.

Now we have our captured images of hand gestures. The next thing is to annotate these images according to their classes. Which means we need to create rectangular boxes around hand gestures and label them appropriately. Don’t worry there is a tool named LabelImg, which is highly helpful to annotate images to create training and test dataset. To start with LabelImg you can follow there GitHub link. The start screen of Labellmg would look like this.

At the left side of the screen, you can find various options. Click on Open dir and choose the input image folder. Then click on Change save dir and select output folder where generated XML files will be saved. This XML file will contain coordinates of your generated rectangular box in the image, something like this.

<annotation>
	<folder>Screenshots</folder>
	<filename>labellmg.png</filename>
	<path>\Pictures\Screenshots\labellmg.png</path>
	<source>
		<database>Unknown</database>
	</source>
	<size>
		<width>1920</width>
		<height>1080</height>
		<depth>3</depth>
	</size>
	<segmented>0</segmented>
	<object>
		<name>left</name>
		<pose>Unspecified</pose>
		<truncated>0</truncated>
		<difficult>0</difficult>
		<bndbox>
			<xmin>277</xmin>
			<ymin>234</ymin>
			<xmax>814</xmax>
			<ymax>780</ymax>
		</bndbox>
	</object>
</annotation>

<folder>Screenshots</folder>

<filename>labellmg.png</filename>

<path>\Pictures\Screenshots\labellmg.png</path>

<database>Unknown</database>

</source>

<size>

</size>

<pose>Unspecified</pose>

</bndbox>

</object>

</annotation>

To create a rectangular box in the image using Labellmg, you just need to press ‘W’ and then create a box and save it. You can create one or multiple boxes in one image as shown in the figure below. Repeat this for all the images.

Now we have images and their corresponding XML files. Then we will separate this dataset into training and testing in the 90/10 ratio. To do this we need to put 90% of images of each class ‘up’, ‘right’, ‘left’ and ‘down’ and their corresponding XML files in one folder and other 10% in other folder.

That’s all for creating dataset, in the next blog we will see how to create TFRecord files from these datasets which will be used for training of the model.

Next Blog: Snake Game Using Tensorflow Object Detection API – Part II

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

TheAILearner

Mastering Artificial Intelligence

Tag Archives: labelimg

Optical Character Recognition Pipeline: Generating Dataset

Snake Game Using Tensorflow Object Detection API – Part II

Snake Game Using Tensorflow Object Detection API

Generate Dataset