Author Archives: kang & atul

Monitoring Training in Keras: Callbacks

Keras is a high level API, can run on top of Tensorflow, CNTK and Theano. Keras is preferable because it is easy and fast to learn. In this blog we will learn a set of functions named as callbacks, used during training in Keras.

Callbacks provides some advantages over normal training in keras. Here I will explain the important ones.

Callback can terminate a training when a Nan loss occurs.
Callback can save the model after every epoch, also you can save the best model.
Early Stopping: Callback can stop training when accuracy stops improving.

Terminate the training when Nan loss occurs

Let’s see the code how to terminate when Nan loss occurs while training:

keras.callbacks.TerminateOnNaN()

1	keras.callbacks.TerminateOnNaN()

Saving Model using Callbacks

To save model after every epoch in keras, we need to import ModelCheckpoint from keras.callbacks. Let’s see the below code which will save the model if validation loss decreases.

from keras.callbacks import ModelCheckpoint

filepath="weights-{epoch:02d}-{val_acc:.2f}.hdf5"
checkpoint = ModelCheckpoint(filepath=filepath, monitor='val_acc', verbose=1, save_best_only=True, mode='max')
callbacks_list = [checkpoint]

model.compile(optimizer = 'sgd', loss = 'binary_crossentropy')
model.fit(train_data, train_label, epochs = 10, batch_size = 64, callbacks = callbacks_list)

from keras.callbacks import ModelCheckpoint

filepath="weights-{epoch:02d}-{val_acc:.2f}.hdf5"

checkpoint = ModelCheckpoint(filepath=filepath, monitor='val_acc', verbose=1, save_best_only=True, mode='max')

callbacks_list = [checkpoint]

model.compile(optimizer = 'sgd', loss = 'binary_crossentropy')

model.fit(train_data, train_label, epochs = 10, batch_size = 64, callbacks = callbacks_list)

In the above code first we have created a ModelCheckpoint object by passing its required parameters.

“filepath” defines the path where all checkpoints will be saved. If you want to save only the best model, then directly pass filepath with name “best_model.hdf5” which will overwrite the previous saved checkpoints.
“monitor” will decide which quantity you want to monitor while training.
“save_best_only” only saves if validation loss decreases.
mode with {auto, min, max} when chosen max, will stop training when monitored quantity stops increasing.

Then finally make a callback list and pass it into model.fit() with parameter callbacks.

Early Stopping

Callbacks can stop training when a monitored quantity has stopped improving. Lets see how:

keras.callbacks.EarlyStopping(monitor='val_loss', min_delta=0, patience=0, verbose=0, mode='auto', baseline=None, restore_best_weights=False)

1	keras.callbacks.EarlyStopping(monitor='val_loss', min_delta=0, patience=0, verbose=0, mode='auto', baseline=None, restore_best_weights=False)

min_delta: It is the minimum quantity which will be taken for improvement to be conceded.
patience: after this number of epochs if training does not improve, it will stop.
mode: in auto mode, the direction will be decided by monitored quantity.
“baseline” the baseline value over which no improvement will stop the training.

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Creating a Bouncing Ball Screensaver using OpenCV-Python

Leave a reply

A screensaver is a computer program that fills the screen with anything you wish when the computer is left idle for some time. Most of you might have used a screensaver on your laptops, TV etc. In the good old days, they used to fascinate most of us. In this blog, we will be creating a bouncing ball screensaver using OpenCV-Python.

Task:

Create a Window that we can write text on. If we don’t write for 10 seconds screensaver will start.

For this we need to do two things:

First, we need to check whether a key is pressed in the specified time. Here, I have used 10 sec.
Second, create a bouncing ball screensaver and display it only if no key is pressed in the specified time, otherwise, display the original screen.

The first part can be done using the OpenCV cv2.waitKey() function which waits for a specific time for a key press (See here for more details).

For the second part, we first need to create a bouncing ball screensaver. The main idea is to change the sign of increment (dx and dy in the code below) on collision with the boundaries. This can be done using the following code

def screensaver():
    img = np.zeros((480,640,3),dtype='uint8')
    dx,dy =1,1
    x,y = 100,100
    while True:
        # Display the image
        cv2.imshow('a',img)
        k = cv2.waitKey(10)
        img = np.zeros((480,640,3),dtype='uint8') 
        # Increment the position
        x = x+dx
        y = y+dy
        cv2.circle(img,(x,y),20,(255,0,0),-1)
        if k != -1:
            break
        # Change the sign of increment on collision with the boundary
        if y >=480:
            dy *= -1
        elif y<=0:
            dy *= -1
        if x >=640:
            dx *= -1
        elif x<=0:
            dx *= -1
    cv2.destroyAllWindows()

def screensaver():

img = np.zeros((480,640,3),dtype='uint8')

dx,dy =1,1

x,y = 100,100

while True:

# Display the image

cv2.imshow('a',img)

k = cv2.waitKey(10)

img = np.zeros((480,640,3),dtype='uint8')

# Increment the position

x = x+dx

y = y+dy

cv2.circle(img,(x,y),20,(255,0,0),-1)

if k != -1:

break

# Change the sign of increment on collision with the boundary

if y >=480:

dy *= -1

elif y<=0:

dy *= -1

if x >=640:

dx *= -1

elif x<=0:

dx *= -1

cv2.destroyAllWindows()

The snapshot of the screensaver looks like this

Now, we need to integrate this screensaver function with the cv2.waitKey() function as shown in the code below

import cv2
import numpy as np
# Background Image
img1 = cv2.imread('D:/downloads/original_image.png')
# Initialize these for text placement
i = 0 
a,b = 30,30
while True:
    cv2.imshow('a',img1)
    k = cv2.waitKey(10000)
    # If no key is pressed, display the screensaver
    if k == -1:
        screensaver()
    # Press Escape to exit
    elif k == 27:
        break
    # Otherwise write real time text on the image.
    # This is used for visualization only
    # You can use anything here.
    else:
        font = cv2.FONT_HERSHEY_SIMPLEX
        cv2.putText(img1,chr(k),(a+i,b), font, 0.5,(255,255,255),2,cv2.LINE_AA)
        if a+i >= img1.shape[1]:
            b = b+15
            i = 0
        i +=10
        
cv2.destroyAllWindows()

import cv2

import numpy as np

# Background Image

img1 = cv2.imread('D:/downloads/original_image.png')

# Initialize these for text placement

i = 0

a,b = 30,30

while True:

cv2.imshow('a',img1)

k = cv2.waitKey(10000)

# If no key is pressed, display the screensaver

if k == -1:

screensaver()

# Press Escape to exit

elif k == 27:

break

# Otherwise write real time text on the image.

# This is used for visualization only

# You can use anything here.

else:

font = cv2.FONT_HERSHEY_SIMPLEX

cv2.putText(img1,chr(k),(a+i,b), font, 0.5,(255,255,255),2,cv2.LINE_AA)

if a+i >= img1.shape[1]:

b = b+15

i = 0

i +=10

cv2.destroyAllWindows()

You need to set the size of the screensaver and background image to be the same. The output looks like this

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Calculating Screen Time of an Actor using Deep Learning

4 Replies

Screen time of an actor in a movie or an episode is very important. Many actors get paid according to their total screen time. Moreover, we also want to know how much time our favorite character acted on screen. So, have you ever wondered how can you calculate the total screen time of an actor? One of the plausible answer is with deep learning.

With the advancement of deep learning now its possible to solve various difficult problems. In this blog, we will learn how to use transfer learning and image classification concepts of deep learning to calculate the screen time of an actor.

To solve any problem with deep learning, the first requirement is the data. For this tutorial, we will use a video clip from the famous TV show “Friends”. We are going to calculate the screen time of my favorite character “Ross”.

Creating Dataset

First, we need to get a video. To do this I have downloaded a video from YouTube using pytube library. For more understanding of pytube, you can follow this blog or use the following code to get started.

from pytube import YouTube as yt

video_link = 'https://www.youtube.com/watch?v=jbRVoTL5djs'
vid = yt(video_link)

stream = vid.streams.first()
stream.download()

from pytube import YouTube as yt

video_link = 'https://www.youtube.com/watch?v=jbRVoTL5djs'

vid = yt(video_link)

stream = vid.streams.first()

stream.download()

Now we have our data in the form of a video which is nothing but a group of frames( images). Since we are going to solve this problem using image classification, we need to extract the images from this video. For this task, I have used OpenCV as shown below

# Opens the Video file
cap= cv2.VideoCapture('Friends - Unagi.mp4')
i=0

image_folder = 'img'
while True:
    ret, frame = cap.read()
    
    if ret == False:
        break
    cv2.imwrite(image_folder+'/'+str(i)+'.jpg',frame)
    i+=1

cap.release()
cv2.destroyAllWindows()

# Opens the Video file

cap= cv2.VideoCapture('Friends - Unagi.mp4')

i=0

image_folder = 'img'

while True:

ret, frame = cap.read()

if ret == False:

break

cv2.imwrite(image_folder+'/'+str(i)+'.jpg',frame)

i+=1

cap.release()

cv2.destroyAllWindows()

The video is now converted into individual frames. In this problem, there is only one class, either “Ross” or “No Ross”. To create a dataset, we need to separate images according to these two manually. For this, I have created a folder named “data” which is having two sub-folder “ross” and “no_ross”. Then manually added images to these two sub-folders. After creating dataset we are ready to dive into the code and concepts.

Input Data and Preprocessing

We are having data in the form of images. To prepare this data for input to our neural network, we need to do some preprocessing with the following steps:

Read all images one by one using openCV
Resize each image to (224, 224, 3) for the input to the model
Divide the data by 255 to make input features to neural network in the same range
Append to corresponding class

from tqdm import tqdm
import cv2
import os
import numpy as np

img_path = 'D:/Downloads/youtube/train/data_1'

class1_data = []
class2_data = []
for classes in os.listdir(img_path):
        fin_path = os.path.join(img_path, classes)
        for fin_classes in tqdm(os.listdir(fin_path)):
            img = cv2.imread(os.path.join(fin_path, fin_classes))
            img = cv2.resize(img, (224,224))
            img = img/255.
            if classes == 'ross':
                class1_data.append(img)
            else:
                class2_data.append(img)

class1_data = np.array(class1_data)
class2_data = np.array(class2_data)

from tqdm import tqdm

import cv2

import os

import numpy as np

img_path = 'D:/Downloads/youtube/train/data_1'

class1_data = []

class2_data = []

for classes in os.listdir(img_path):

fin_path = os.path.join(img_path, classes)

for fin_classes in tqdm(os.listdir(fin_path)):

img = cv2.imread(os.path.join(fin_path, fin_classes))

img = cv2.resize(img, (224,224))

img = img/255.

if classes == 'ross':

class1_data.append(img)

else:

class2_data.append(img)

class1_data = np.array(class1_data)

class2_data = np.array(class2_data)

Transfer Learning

Since we have only 6814 images, so it will be difficult to train a neural network with this little dataset. Here comes the concept of transfer learning.

With the help of transfer learning, we can use features generated by a model trained on a large dataset into our model. Here we will use VGG16 model trained on “imagenet” dataset. For this, we are using tensorflow high-level API Keras. With keras, you can directly import VGG16 model as shown in the code below.

import keras
from keras.applications import VGG16

vgg_model = VGG16(include_top=False, weights='imagenet')

import keras

from keras.applications import VGG16

vgg_model = VGG16(include_top=False, weights='imagenet')

VGG16 model trained with imagenet dataset predicts on lots of classes, but in this problem, we are only having one class, either “Ross” or “No Ross”. That’s why above we are using include_top = False, which signifies that we are not including fully connected layers from the VGG16 model. Now we will pass our input data to vgg_model and generate the features.

vgg_class1 = vgg_model.predict(class1_data)
vgg_class2 = vgg_model.predict(class2_data)

1 2	vgg_class1 = vgg_model.predict(class1_data) vgg_class2 = vgg_model.predict(class2_data)

Network Architectures

Since we are not including fully connected layers from VGG16 model, we need to create a model with some fully connected layers and an output layer with 1 class, either “Ross” or “No Ross”. Output features from VGG16 model will be having shape 7*7*512, which will be input shape for our model. Here I am also using dropout layer to make model less over-fit. Let’s see the code:

from keras.layers import Input, Dense, Dropout
from keras.models import Model

inputs = Input(shape=(7*7*512,))

dense1 = Dense(1024, activation = 'relu')(inputs)
drop1 = Dropout(0.5)(dense1)
dense2 = Dense(512, activation = 'relu')(drop1)
drop2 = Dropout(0.5)(dense2)
outputs = Dense(1, activation = 'sigmoid')(drop2)

model = Model(inputs, outputs)
model.summary()

from keras.layers import Input, Dense, Dropout

from keras.models import Model

inputs = Input(shape=(7*7*512,))

dense1 = Dense(1024, activation = 'relu')(inputs)

drop1 = Dropout(0.5)(dense1)

dense2 = Dense(512, activation = 'relu')(drop1)

drop2 = Dropout(0.5)(dense2)

outputs = Dense(1, activation = 'sigmoid')(drop2)

model = Model(inputs, outputs)

model.summary()

Splitting Data into Train and Validation

Now we have input features from VGG16 model and our own network architecture defined above. Next thing is to train this neural network. But we are lacking our validation data. We are having 6814 images, so we will split this into 5000 training images and 1814 validation images.

train_data = np.concatenate((vgg_class1[:3000], vgg_class2[:2000]), axis = 0)
train_data = train_data.reshape(train_data.shape[0],7*7*512)

valid_data = np.concatenate((vgg_class1[3000:], vgg_class2[2000:]), axis = 0)
valid_data = valid_data.reshape(valid_data.shape[0],7*7*512)

train_data = np.concatenate((vgg_class1[:3000], vgg_class2[:2000]), axis = 0)

train_data = train_data.reshape(train_data.shape[0],7*7*512)

valid_data = np.concatenate((vgg_class1[3000:], vgg_class2[2000:]), axis = 0)

valid_data = valid_data.reshape(valid_data.shape[0],7*7*512)

According to our created class 1, class 2, training and validation data, we will create our output y labels.

train_label = np.array([0]*vgg_class1[:3000].shape[0] + [1]*vgg_class2[:2000].shape[0])
valid_label = np.array([0]*vgg_class1[3000:].shape[0] + [1]*vgg_class2[2000:].shape[0])

1 2	train_label = np.array([0]vgg_class1[:3000].shape[0] + [1]vgg_class2[:2000].shape[0]) valid_label = np.array([0]vgg_class1[3000:].shape[0] + [1]vgg_class2[2000:].shape[0])

Training the Network

All set, we are ready to train our model. Here, we will use stochastic gradient descent as an optimizer and binary cross-entropy as our loss function. We are also going to save our checkpoint for the best model according to it’s validation dataset accuracy.

import tensorflow as tf
from keras.callbacks import ModelCheckpoint

tf.logging.set_verbosity(tf.logging.ERROR)
model.compile(optimizer = 'sgd', loss = 'binary_crossentropy', metrics = ['accuracy'])

filepath="best_model.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, save_best_only=True, mode='max')
callbacks_list = [checkpoint]

import tensorflow as tf

from keras.callbacks import ModelCheckpoint

tf.logging.set_verbosity(tf.logging.ERROR)

model.compile(optimizer = 'sgd', loss = 'binary_crossentropy', metrics = ['accuracy'])

filepath="best_model.hdf5"

checkpoint = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, save_best_only=True, mode='max')

callbacks_list = [checkpoint]

I am using batch size of 64 and 10 epochs to train.

model.fit(train_data, train_label, epochs = 10, batch_size = 64, validation_data = (valid_data, valid_label), verbose = 2, callbacks = callbacks_list)

1	model.fit(train_data, train_label, epochs = 10, batch_size = 64, validation_data = (valid_data, valid_label), verbose = 2, callbacks = callbacks_list)

Training and validation accuracy looks quite pleasing. Now let’s calculate screen time of “Ross”.

Calculating Screen Time

To test our trained model and calculate the screen time, I have downloaded another “friends” video clip from YouTube and extracted images. To calculate the screen time, first I have used the trained model to predict each image to find out which class it belongs, either “Ross” or “No Ross”. Since video is made up of 24 frames per second, we will count the number of frames which has been predicted for having “Ross” in it and then divide it by 24 to count the number of seconds “Ross” was on screen.

import os
import numpy as np

ross_images = []
no_ross_images = []

test_path = 'D:/Downloads/youtube/test/data_4/test_images'

for test in tqdm(os.listdir(test_path)):
    test_img = cv2.imread(os.path.join(test_path, test))
    test_img = cv2.resize(test_img, (224,224))
    test_img = test_img/255.
    test_img = np.expand_dims(test_img, 0)
    pred_img = vgg_model.predict(test_img)
    pred_feat = pred_img.reshape(1, 7*7*512)
    out_class = model.predict(pred_feat)
    if out_class < 0.5:
        ross_images.append(out_class)
    else:
        no_ross_images.append(out_class)

import os

import numpy as np

ross_images = []

no_ross_images = []

test_path = 'D:/Downloads/youtube/test/data_4/test_images'

for test in tqdm(os.listdir(test_path)):

test_img = cv2.imread(os.path.join(test_path, test))

test_img = cv2.resize(test_img, (224,224))

test_img = test_img/255.

test_img = np.expand_dims(test_img, 0)

pred_img = vgg_model.predict(test_img)

pred_feat = pred_img.reshape(1, 7*7*512)

out_class = model.predict(pred_feat)

if out_class < 0.5:

ross_images.append(out_class)

else:

no_ross_images.append(out_class)

This test video clip is made up of 24 frames per second and number of images predicted for having “Ross” in it are 4715. So the screen time for Ross will be 4715/24 = 196 seconds.

Summary

We can see good accuracy on train and validation dataset but when I tested it on test dataset, the accuracy was about 65%. The one reason that I figured out is less training data. If you can get more data then accuracy can be higher. Another reason can be co-variance shift which means the test dataset is quite different from training dataset due to different video quality.

This type of technique can be very helpful in calculating screen time of a particular character.

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Add image to a live camera feed using OpenCV-Python

5 Replies

In this blog, we will learn how to add an image to a live camera feed using OpenCV-Python. Also known as Image Blending. In this we take the weighted sum of two images. These weights give a feeling of blending or transparency.

Images are added as per the equation below:

Since an image is a matrix so for the above equation to satisfy, both img1 and img2 must be of equal size.

OpenCV has a built-in function that does the exact same thing as shown below

output = cv2.addWeighted(img1, weight_img1, img2, weight_img2, gamma)
# gamma is the scalar added to each sum

1 2	output = cv2.addWeighted(img1, weight_img1, img2, weight_img2, gamma) # gamma is the scalar added to each sum

The idea is that first, we will select which image we want to overlay (another image will serve as the background). Then we need to select the region in the background image where we want to put the overlay image. Add this selected region with the overlay image using the above equation. At last change the region in the background image with the result obtained in the previous line.

I hope you understand the idea. Now, let’s get started

Task:

Overlay a white square image on the live webcam feed according to different weights. Instead of manually giving weights, set two keys which on pressing increase or decrease the weights.

Steps:

Take an image which you want to overlay. Here, I have used a small white square created using numpy. You can use any.
Open the camera using cv2.VideoCapture()
Initialize the weights (alpha).
Until the camera is opened
- Read the frame using cap.read()
- Select the region in the frame where we want to add the image and add the images using cv2.addWeighted()
- Change the region in the frame with the result obtained
- Display the current value of weights using cv2.putText()
- Display the image using cv2.imshow()
- On pressing ‘a’ increase the value of alpha by 0.1 and decrease by the same amount on pressing ‘d’
- Press ‘q’ to break

Code:

import cv2
import numpy as np
 
# create an overlay image. You can use any image
foreground = np.ones((100,100,3),dtype='uint8')*255
# Open the camera
cap = cv2.VideoCapture(0)
# Set initial value of weights
alpha = 0.4
while True:
    # read the background
    ret, background = cap.read()
    background = cv2.flip(background,1)
    # Select the region in the background where we want to add the image and add the images using cv2.addWeighted()
    added_image = cv2.addWeighted(background[150:250,150:250,:],alpha,foreground[0:100,0:100,:],1-alpha,0)
    # Change the region with the result
    background[150:250,150:250] = added_image
    # For displaying current value of alpha(weights)
    font = cv2.FONT_HERSHEY_SIMPLEX
    cv2.putText(background,'alpha:{}'.format(alpha),(10,30), font, 1,(255,255,255),2,cv2.LINE_AA)
    cv2.imshow('a',background)
    k = cv2.waitKey(10)
    # Press q to break
    if k == ord('q'):
        break
    # press a to increase alpha by 0.1
    if k == ord('a'):
        alpha +=0.1
        if alpha >=1.0:
            alpha = 1.0
    # press d to decrease alpha by 0.1
    elif k== ord('d'):
        alpha -= 0.1
        if alpha <=0.0:
            alpha = 0.0
# Release the camera and destroy all windows         
cap.release()
cv2.destroyAllWindows()

import cv2

import numpy as np

# create an overlay image. You can use any image

foreground = np.ones((100,100,3),dtype='uint8')*255

# Open the camera

cap = cv2.VideoCapture(0)

# Set initial value of weights

alpha = 0.4

while True:

# read the background

ret, background = cap.read()

background = cv2.flip(background,1)

# Select the region in the background where we want to add the image and add the images using cv2.addWeighted()

added_image = cv2.addWeighted(background[150:250,150:250,:],alpha,foreground[0:100,0:100,:],1-alpha,0)

# Change the region with the result

background[150:250,150:250] = added_image

# For displaying current value of alpha(weights)

font = cv2.FONT_HERSHEY_SIMPLEX

cv2.putText(background,'alpha:{}'.format(alpha),(10,30), font, 1,(255,255,255),2,cv2.LINE_AA)

cv2.imshow('a',background)

k = cv2.waitKey(10)

# Press q to break

if k == ord('q'):

break

# press a to increase alpha by 0.1

if k == ord('a'):

alpha +=0.1

if alpha >=1.0:

alpha = 1.0

# press d to decrease alpha by 0.1

elif k== ord('d'):

alpha -= 0.1

if alpha <=0.0:

alpha = 0.0

# Release the camera and destroy all windows

cap.release()

cv2.destroyAllWindows()

See the change in transparency by pressing keys ‘a’ and ‘d’. The output looks like this

You might encounter wrong values of alpha being displayed. This is because of Python’s floating point limitations.

Hope you enjoy reading. In the next blog, we will learn how to do the same for the non-rectangular region of interest.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Set Camera Timer using OpenCV-Python

Leave a reply

Most of you must have clicked the photograph with a Timer. This feature sets a countdown before clicking a photograph. In this tutorial, we will be doing the same i.e. creating our own camera timer using OpenCV-Python. Sounds interesting, so let’s get started.

The main idea is that whenever a particular key is pressed (Here, I have used ‘q’), the countdown will begin and a photo will be clicked and saved at the desired location. Otherwise the video will continue streaming.

Here, we will be using cv2.putText() function for drawing the countdown on the video. This function has the following arguments

cv2.putText(img, text, position, font, fontScale, color, thickness, lineType, bottomLeftOrigin)

1	cv2.putText(img, text, position, font, fontScale, color, thickness, lineType, bottomLeftOrigin)

This function draws the text on the input image at the specified position. If the specified font is unable to render any character, it is replaced by a question mark.

Now let’s see how to do this

Steps:

Open the camera using cv2.VideoCapture()
Until the camera is open
- Read the frame and display it using cv2.imshow()
- Set the countdown. Here, I have taken this as 30 and I am displaying it after 10 frames so that it is easily visible. Otherwise, it will be too fast. You can set it to anything as you wish
- Set a key for the countdown to begin
- If the key is pressed, show the countdown on the video using cv2.putText(). As the countdown finishes, save the frame at the desired location.
- Otherwise, the video will continue streaming
On pressing ‘Esc’ the video will stop streaming.

Code:

import cv2
import time

# Open the camera
cap = cv2.VideoCapture(0)

while True:
    # Read and display each frame
    ret, img = cap.read()
    cv2.imshow('a',img)
    k = cv2.waitKey(125)
    # Specify the countdown
    j = 30
    # set the key for the countdown to begin
    if k == ord('q'):
        while j>=10:
            ret, img = cap.read()
            # Display the countdown after 10 frames so that it is easily visible otherwise,
            # it will be fast. You can set it to anything or remove this condition and put 
            # countdown on each frame
            if j%10 == 0:
                # specify the font and draw the countdown using puttext
                font = cv2.FONT_HERSHEY_SIMPLEX
                cv2.putText(img,str(j//10),(250,250), font, 7,(255,255,255),10,cv2.LINE_AA)
            cv2.imshow('a',img)
            cv2.waitKey(125)
            j = j-1
        else:
            ret, img = cap.read()
            # Display the clicked frame for 1 sec.
            # You can increase time in waitKey also
            cv2.imshow('a',img)
            cv2.waitKey(1000)
            # Save the frame
            cv2.imwrite('D:/downloads/camera.jpg',img)
    # Press Esc to exit
    elif k == 27:
        break
cap.release()
cv2.destroyAllWindows()

import cv2

import time

# Open the camera

cap = cv2.VideoCapture(0)

while True:

# Read and display each frame

ret, img = cap.read()

cv2.imshow('a',img)

k = cv2.waitKey(125)

# Specify the countdown

j = 30

# set the key for the countdown to begin

if k == ord('q'):

while j>=10:

ret, img = cap.read()

# Display the countdown after 10 frames so that it is easily visible otherwise,

# it will be fast. You can set it to anything or remove this condition and put

# countdown on each frame

if j%10 == 0:

# specify the font and draw the countdown using puttext

font = cv2.FONT_HERSHEY_SIMPLEX

cv2.putText(img,str(j//10),(250,250), font, 7,(255,255,255),10,cv2.LINE_AA)

cv2.imshow('a',img)

cv2.waitKey(125)

j = j-1

else:

ret, img = cap.read()

# Display the clicked frame for 1 sec.

# You can increase time in waitKey also

cv2.imshow('a',img)

cv2.waitKey(1000)

# Save the frame

cv2.imwrite('D:/downloads/camera.jpg',img)

# Press Esc to exit

elif k == 27:

break

cap.release()

cv2.destroyAllWindows()

The output looks like this

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Downloading Video from YouTube using Python

1 Reply

YouTube is a rich source of videos and downloading videos from YouTube is a little difficult. There are some extensions and downloaders available but those are sometimes not recommended. But in Python, it is quite easy to download video from YouTube.

So, in this blog we will learn how to download videos from YouTube directly using video link.

To download videos from YouTube, we will use a python library named ‘pytube’. First you need to install ‘pytube’ using following command.

pip install pytube

1	pip install pytube

Now we are having all necessary libraries required for this task. Now import the required module from ‘pytube’ library.

from pytube import YouTube

1	from pytube import YouTube

The only thing that you will need from video is it’s video link.

video_link = 'https://www.youtube.com/watch?v=6FGVFYaz_S0'

1	video_link = 'https://www.youtube.com/watch?v=6FGVFYaz_S0'

Then create an object of the earlier imported module ‘YouTube’ using this video link.

vid = YouTube(video_link)

1	vid = YouTube(video_link)

After getting object we need to get stream and from that get the first result of the stream which will be our required video.

stream = vid.streams.first()

1	stream = vid.streams.first()

Now we have our video stream, to download it just call the following function.

stream.download()

1	stream.download()

The above code will download the video in the current working directory with the name of the video as it was in the YouTube. To download it to a specific folder with specific name you need pass two arguments in the above code as below.

stream.download(output_path='D:/Downloads', filename='test')

1	stream.download(output_path='D:/Downloads', filename='test')

That was a simple code to download videos from YouTube. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Show current DateTime on live video using OpenCV-Python

1 Reply

Have you seen the security cameras output where DateTime continuously keeps updating? In this blog, we will be doing the same using OpenCV-Python i.e. we will put current DateTime on the live webcam feed. So, let’s get started.

For fetching current DateTime, we will be using Python’s DateTime module. The following code shows how to get the current DateTime

# get current DateTime
from datetime import datetime
print(datetime.now())

# get current DateTime

from datetime import datetime

print(datetime.now())

To put the DateTime on the live video, we will be using cv2.putText() on each frame as shown below

# Put current DateTime on each frame
cv2.putText(img,str(datetime.now()),(140,250), font, .5,(255,255,255),2,cv2.LINE_AA)

1 2	# Put current DateTime on each frame cv2.putText(img,str(datetime.now()),(140,250), font, .5,(255,255,255),2,cv2.LINE_AA)

To know more about cv2.putText(), refer to this blog.

Above are the two things, that we will be needing for this task. I hope you understand these. Now, let’s get started

Steps:

Open the camera using cv2.VideoCapture()
Until the camera is open
- Grab each frame using cap.read()
- Put the current DateTime on each frame using cv2.putText() as discussed above
- Display each frame using cv2.imshow()
On termination, release the webcam and destroy all windows using cap.release() and cv2.destroyAllWindows() respectively.

Code:

import cv2
from datetime import datetime
# Open the Camera
cap = cv2.VideoCapture(0)
while True:
    ret, img = cap.read()
    # Put current DateTime on each frame
    font = cv2.FONT_HERSHEY_SIMPLEX
    cv2.putText(img,str(datetime.now()),(10,30), font, 1,(255,255,255),2,cv2.LINE_AA)
    # Display the image
    cv2.imshow('a',img)
    # wait for keypress
    k = cv2.waitKey(10)
    if k == ord('q'):
        break
cap.release()
cv2.destroyAllWindows()

import cv2

from datetime import datetime

# Open the Camera

cap = cv2.VideoCapture(0)

while True:

ret, img = cap.read()

# Put current DateTime on each frame

font = cv2.FONT_HERSHEY_SIMPLEX

cv2.putText(img,str(datetime.now()),(10,30), font, 1,(255,255,255),2,cv2.LINE_AA)

# Display the image

cv2.imshow('a',img)

# wait for keypress

k = cv2.waitKey(10)

if k == ord('q'):

break

cap.release()

cv2.destroyAllWindows()

The snapshot of the output looks like this

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Scraping Video Information from YouTube

3 Replies

Web scraping is a way to extract information from the internet in an automated fashion. We all know that YouTube is a huge resource of data having tons of videos with their relative information’s like views, comments, etc.In this blog we will learn how to use web scraping in python to extract video information from YouTube search. In video information we will extract number of views and video heading appeared in search results.

To get started with this, we first need to install two important libraries. First is ” requests ” to get the response from a YouTube search result and other is ” Beautiful Soup ” to parse this response into html content.

pip install requests
pip install -U bs4

1 2	pip install requests pip install -U bs4

Now we have install the required libraries, let’s get started.

Import the libraries

from bs4 import BeautifulSoup as bs
import requests

1 2	from bs4 import BeautifulSoup as bs import requests

Whenever you search in YouTube, it creates a base search URL and then adds your search query into that URL to complete the it. Let say we search ” theailearner ” in the YouTube. Base search URL and query can be defined as follows.

base_url = 'https://www.youtube.com/results?search_query='
search_string = 'theailearner'
URL = base_url + search_string

base_url = 'https://www.youtube.com/results?search_query='

search_string = 'theailearner'

URL = base_url + search_string

Now, we will scrape the data from this URL using ” requests ” library.

response = requests.get(URL)
page = response.text

1 2	response = requests.get(URL) page = response.text

Once we scraped the data, we will parse it into HTML using beautiful soup and find all the videos information resulted in search result. To extract particular information we will use particular class from HTML data.

soup = bs(page, 'html.parser')
vids = soup.findAll('a',attrs={'class':'yt-uix-tile-link'})

1 2	soup = bs(page, 'html.parser') vids = soup.findAll('a',attrs={'class':'yt-uix-tile-link'})

The above used soup.findall() function will give the required data, but to make it easily understandable we need to run a simple python script.

for v in vids:
    print(v['title'])
    v = str(v)
    views = ''
    try:
        indx = v.index('views')
        indx = indx - 2
        while v[indx] is not ' ':
            views = views + v[indx]
            indx = indx -1
        print(views[::-1])
    except:
        continue

for v in vids:

print(v['title'])

v = str(v)

views = ''

try:

indx = v.index('views')

indx = indx - 2

while v[indx] is not ' ':

views = views + v[indx]

indx = indx -1

print(views[::-1])

except:

continue

Now you might have got some feeling about how to scrape data from YouTube. We can also scrape the other data from YouTube like video information from a channel, comments in a video, likes and dislikes and etc.

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Write Text on images in real-time using OpenCV-Python

Leave a reply

In this blog, we will learn how to use OpenCV cv2.putText() function for writing text on images in real-time. Most of you might have used cv2.putText(), if not this is how it looks like.

cv2.putText(img, text, position, font, fontScale, color, thickness, lineType, bottomLeftOrigin)

The above function draws the text on the input image at the specified position. If the specified font is unable to render any character, it is replaced by a question mark.

Another important thing that we will be using is OpenCV cv2.waitKey() function. This returns -1 when no key is pressed otherwise returns the ASCII value of the key pressed or a 32-bit integer value depending upon the platform or keyboard modifier(Num lock etc.). You can find this by printing the key as shown below.

import cv2
import numpy as np
img = np.zeros((500,500,3),dtype = 'uint8') # Create a dummy image
while True:
    cv2.imshow('a',img)
    k = cv2.waitKey(0)
    print(k)
    if k == ord('q'):
        break
cv2.destroyAllWindows()

import cv2

import numpy as np

img = np.zeros((500,500,3),dtype = 'uint8') # Create a dummy image

while True:

cv2.imshow('a',img)

k = cv2.waitKey(0)

print(k)

if k == ord('q'):

break

cv2.destroyAllWindows()

If it returns a 32-bit integer, then use cv2.waitKey() & 0xFF which leaves only the last 8 bits of the original 32 bit.

ord(‘q’) converts the character to an int while chr(113) does exactly the opposite as shown in the code below.

>>> ord('q')
113
>>> chr(113)
'q'

>>> ord('q')

113

>>> chr(113)

'q'

I hope you understand all this, now let’s get started

Steps:

Read the image and initialize the counter that will be used for changing the position of the text.
Inside an infinite while loop,
- display the image and use cv2.waitKey() for a keypress.
- Convert this key into character using chr() and draw it on the image using cv2.putText().
- Increase the counter.
- Provide the termination condition
On exit, destroy all windows.

Below is the code for this

import cv2
# Read the image
img = cv2.imread('D:/downloads/original_image.png')
# initialize counter
i = 0
while True:
    # Display the image
    cv2.imshow('a',img)
    # wait for keypress
    k = cv2.waitKey(0)
    # specify the font and draw the key using puttext
    font = cv2.FONT_HERSHEY_SIMPLEX
    cv2.putText(img,chr(k),(140+i,250), font, .5,(255,255,255),2,cv2.LINE_AA)
    i+=10
    if k == ord('q'):
        break
cv2.destroyAllWindows()

import cv2

# Read the image

img = cv2.imread('D:/downloads/original_image.png')

# initialize counter

i = 0

while True:

# Display the image

cv2.imshow('a',img)

# wait for keypress

k = cv2.waitKey(0)

# specify the font and draw the key using puttext

font = cv2.FONT_HERSHEY_SIMPLEX

cv2.putText(img,chr(k),(140+i,250), font, .5,(255,255,255),2,cv2.LINE_AA)

i+=10

if k == ord('q'):

break

cv2.destroyAllWindows()

The output looks like this

Hope you enjoy reading. In the next blog, we will learn how to write text on images at mouse click position.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Write Text on images at mouse click position using OpenCV-Python

Leave a reply

In the previous blog, we discussed how to write text on images in real-time. In that, we manually specified the position for text placement. This is quite tedious if we were to write text at multiple positions.

So, what if we automate this process. That is we automatically get the coordinates of the image where we click and then put text at that position using cv2.putText() function as we did in the previous blog.

This is what we will do in this blog i.e. write text on images at mouse click position. To do this, we will create a mouse callback function and then bind this function to the image window.

Mouse callback function is executed whenever a mouse event takes place. Mouse event refers to anything we do with the mouse like double click, left click etc. All available events can be found using the following code

import cv2
events = [i for i in dir(cv2) if 'EVENT' in i]
print events

import cv2

events = [i for i in dir(cv2) if 'EVENT' in i]

print events

Below is an example of a simple mouse callback function that draws a circle where we double click.

# mouse callback function
def draw_circle(event,x,y,flags,param):
    if event == cv2.EVENT_LBUTTONDBLCLK:
        cv2.circle(img,(x,y),100,(255,0,0),-1)

# mouse callback function

def draw_circle(event,x,y,flags,param):

if event == cv2.EVENT_LBUTTONDBLCLK:

cv2.circle(img,(x,y),100,(255,0,0),-1)

We then need to bind this callback function to the image window. This is done using
cv2.setMouseCallback(window_name, mouse_callback_function) as shown below

cv2.setMouseCallback('img',draw_circle)

1	cv2.setMouseCallback('img',draw_circle)

I hope you understood mouse callback function, now let’s get started

Steps:

Create a mouse callback function where on every left double click position we put text on the image.
Create or read an image.
Create an image window using cv2.namedWindow()
Bind the mouse callback function to the image window using cv2.setMouseCallback()
Display the new image using an infinite while loop

Code:

import cv2
import numpy as np

font = cv2.FONT_HERSHEY_SIMPLEX
# mouse callback function
def draw_circle(event,x,y,flags,param):
    if event == cv2.EVENT_LBUTTONDBLCLK:
        i = 0
        while True:
            cv2.imshow('image',img) # to display the characters
            k = cv2.waitKey(0)
            cv2.putText(img, chr(k) , (x+i,y), font, 0.5, (0, 255, 0), 2, cv2.LINE_AA)
            i+=10
            # Press q to stop writing
            if k == ord('q'):
                break

    

# Create a black image, a window and bind the function to window
img = np.zeros((512,512,3), np.uint8)
cv2.namedWindow('image')
cv2.setMouseCallback('image',draw_circle)

while True:
    cv2.imshow('image',img)
    if cv2.waitKey(20) == 27:
        break
cv2.destroyAllWindows()

import cv2

import numpy as np

font = cv2.FONT_HERSHEY_SIMPLEX

# mouse callback function

def draw_circle(event,x,y,flags,param):

if event == cv2.EVENT_LBUTTONDBLCLK:

i = 0

while True:

cv2.imshow('image',img) # to display the characters

k = cv2.waitKey(0)

cv2.putText(img, chr(k) , (x+i,y), font, 0.5, (0, 255, 0), 2, cv2.LINE_AA)

i+=10

# Press q to stop writing

if k == ord('q'):

break

# Create a black image, a window and bind the function to window

img = np.zeros((512,512,3), np.uint8)

cv2.namedWindow('image')

cv2.setMouseCallback('image',draw_circle)

while True:

cv2.imshow('image',img)

if cv2.waitKey(20) == 27:

break

cv2.destroyAllWindows()

In the above code, press ‘q’ to stop writing and left double click anywhere to again start writing.

You can play with mouse callback function using other mouse events. Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

TheAILearner

Mastering Artificial Intelligence

Author Archives: kang & atul

Monitoring Training in Keras: Callbacks

Terminate the training when Nan loss occurs

Saving Model using Callbacks

Early Stopping

Creating a Bouncing Ball Screensaver using OpenCV-Python

Task:

Calculating Screen Time of an Actor using Deep Learning

Creating Dataset

Input Data and Preprocessing

Transfer Learning

Network Architectures

Splitting Data into Train and Validation

Training the Network

Calculating Screen Time

Summary

Add image to a live camera feed using OpenCV-Python

Task:

Steps:

Code:

Set Camera Timer using OpenCV-Python

Steps:

Code:

Downloading Video from YouTube using Python

Show current DateTime on live video using OpenCV-Python

Steps:

Code:

Scraping Video Information from YouTube

Write Text on images in real-time using OpenCV-Python

Steps:

Write Text on images at mouse click position using OpenCV-Python

Steps:

Code: