Tag Archives: keras

Calculating Screen Time of an Actor using Deep Learning

Screen time of an actor in a movie or an episode is very important. Many actors get paid according to their total screen time. Moreover, we also want to know how much time our favorite character acted on screen. So, have you ever wondered how can you calculate the total screen time of an actor? One of the plausible answer is with deep learning.

With the advancement of deep learning now its possible to solve various difficult problems. In this blog, we will learn how to use transfer learning and image classification concepts of deep learning to calculate the screen time of an actor.

To solve any problem with deep learning, the first requirement is the data. For this tutorial, we will use a video clip from the famous TV show “Friends”. We are going to calculate the screen time of my favorite character “Ross”.

Creating Dataset

First, we need to get a video. To do this I have downloaded a video from YouTube using pytube library. For more understanding of pytube, you can follow this blog or use the following code to get started.

from pytube import YouTube as yt

video_link = 'https://www.youtube.com/watch?v=jbRVoTL5djs'
vid = yt(video_link)

stream = vid.streams.first()
stream.download()

from pytube import YouTube as yt

video_link = 'https://www.youtube.com/watch?v=jbRVoTL5djs'

vid = yt(video_link)

stream = vid.streams.first()

stream.download()

Now we have our data in the form of a video which is nothing but a group of frames( images). Since we are going to solve this problem using image classification, we need to extract the images from this video. For this task, I have used OpenCV as shown below

# Opens the Video file
cap= cv2.VideoCapture('Friends - Unagi.mp4')
i=0

image_folder = 'img'
while True:
    ret, frame = cap.read()
    
    if ret == False:
        break
    cv2.imwrite(image_folder+'/'+str(i)+'.jpg',frame)
    i+=1

cap.release()
cv2.destroyAllWindows()

# Opens the Video file

cap= cv2.VideoCapture('Friends - Unagi.mp4')

i=0

image_folder = 'img'

while True:

ret, frame = cap.read()

if ret == False:

break

cv2.imwrite(image_folder+'/'+str(i)+'.jpg',frame)

i+=1

cap.release()

cv2.destroyAllWindows()

The video is now converted into individual frames. In this problem, there is only one class, either “Ross” or “No Ross”. To create a dataset, we need to separate images according to these two manually. For this, I have created a folder named “data” which is having two sub-folder “ross” and “no_ross”. Then manually added images to these two sub-folders. After creating dataset we are ready to dive into the code and concepts.

Input Data and Preprocessing

We are having data in the form of images. To prepare this data for input to our neural network, we need to do some preprocessing with the following steps:

Read all images one by one using openCV
Resize each image to (224, 224, 3) for the input to the model
Divide the data by 255 to make input features to neural network in the same range
Append to corresponding class

from tqdm import tqdm
import cv2
import os
import numpy as np

img_path = 'D:/Downloads/youtube/train/data_1'

class1_data = []
class2_data = []
for classes in os.listdir(img_path):
        fin_path = os.path.join(img_path, classes)
        for fin_classes in tqdm(os.listdir(fin_path)):
            img = cv2.imread(os.path.join(fin_path, fin_classes))
            img = cv2.resize(img, (224,224))
            img = img/255.
            if classes == 'ross':
                class1_data.append(img)
            else:
                class2_data.append(img)

class1_data = np.array(class1_data)
class2_data = np.array(class2_data)

from tqdm import tqdm

import cv2

import os

import numpy as np

img_path = 'D:/Downloads/youtube/train/data_1'

class1_data = []

class2_data = []

for classes in os.listdir(img_path):

fin_path = os.path.join(img_path, classes)

for fin_classes in tqdm(os.listdir(fin_path)):

img = cv2.imread(os.path.join(fin_path, fin_classes))

img = cv2.resize(img, (224,224))

img = img/255.

if classes == 'ross':

class1_data.append(img)

else:

class2_data.append(img)

class1_data = np.array(class1_data)

class2_data = np.array(class2_data)

Transfer Learning

Since we have only 6814 images, so it will be difficult to train a neural network with this little dataset. Here comes the concept of transfer learning.

With the help of transfer learning, we can use features generated by a model trained on a large dataset into our model. Here we will use VGG16 model trained on “imagenet” dataset. For this, we are using tensorflow high-level API Keras. With keras, you can directly import VGG16 model as shown in the code below.

import keras
from keras.applications import VGG16

vgg_model = VGG16(include_top=False, weights='imagenet')

import keras

from keras.applications import VGG16

vgg_model = VGG16(include_top=False, weights='imagenet')

VGG16 model trained with imagenet dataset predicts on lots of classes, but in this problem, we are only having one class, either “Ross” or “No Ross”. That’s why above we are using include_top = False, which signifies that we are not including fully connected layers from the VGG16 model. Now we will pass our input data to vgg_model and generate the features.

vgg_class1 = vgg_model.predict(class1_data)
vgg_class2 = vgg_model.predict(class2_data)

1 2	vgg_class1 = vgg_model.predict(class1_data) vgg_class2 = vgg_model.predict(class2_data)

Network Architectures

Since we are not including fully connected layers from VGG16 model, we need to create a model with some fully connected layers and an output layer with 1 class, either “Ross” or “No Ross”. Output features from VGG16 model will be having shape 7*7*512, which will be input shape for our model. Here I am also using dropout layer to make model less over-fit. Let’s see the code:

from keras.layers import Input, Dense, Dropout
from keras.models import Model

inputs = Input(shape=(7*7*512,))

dense1 = Dense(1024, activation = 'relu')(inputs)
drop1 = Dropout(0.5)(dense1)
dense2 = Dense(512, activation = 'relu')(drop1)
drop2 = Dropout(0.5)(dense2)
outputs = Dense(1, activation = 'sigmoid')(drop2)

model = Model(inputs, outputs)
model.summary()

from keras.layers import Input, Dense, Dropout

from keras.models import Model

inputs = Input(shape=(7*7*512,))

dense1 = Dense(1024, activation = 'relu')(inputs)

drop1 = Dropout(0.5)(dense1)

dense2 = Dense(512, activation = 'relu')(drop1)

drop2 = Dropout(0.5)(dense2)

outputs = Dense(1, activation = 'sigmoid')(drop2)

model = Model(inputs, outputs)

model.summary()

Splitting Data into Train and Validation

Now we have input features from VGG16 model and our own network architecture defined above. Next thing is to train this neural network. But we are lacking our validation data. We are having 6814 images, so we will split this into 5000 training images and 1814 validation images.

train_data = np.concatenate((vgg_class1[:3000], vgg_class2[:2000]), axis = 0)
train_data = train_data.reshape(train_data.shape[0],7*7*512)

valid_data = np.concatenate((vgg_class1[3000:], vgg_class2[2000:]), axis = 0)
valid_data = valid_data.reshape(valid_data.shape[0],7*7*512)

train_data = np.concatenate((vgg_class1[:3000], vgg_class2[:2000]), axis = 0)

train_data = train_data.reshape(train_data.shape[0],7*7*512)

valid_data = np.concatenate((vgg_class1[3000:], vgg_class2[2000:]), axis = 0)

valid_data = valid_data.reshape(valid_data.shape[0],7*7*512)

According to our created class 1, class 2, training and validation data, we will create our output y labels.

train_label = np.array([0]*vgg_class1[:3000].shape[0] + [1]*vgg_class2[:2000].shape[0])
valid_label = np.array([0]*vgg_class1[3000:].shape[0] + [1]*vgg_class2[2000:].shape[0])

1 2	train_label = np.array([0]vgg_class1[:3000].shape[0] + [1]vgg_class2[:2000].shape[0]) valid_label = np.array([0]vgg_class1[3000:].shape[0] + [1]vgg_class2[2000:].shape[0])

Training the Network

All set, we are ready to train our model. Here, we will use stochastic gradient descent as an optimizer and binary cross-entropy as our loss function. We are also going to save our checkpoint for the best model according to it’s validation dataset accuracy.

import tensorflow as tf
from keras.callbacks import ModelCheckpoint

tf.logging.set_verbosity(tf.logging.ERROR)
model.compile(optimizer = 'sgd', loss = 'binary_crossentropy', metrics = ['accuracy'])

filepath="best_model.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, save_best_only=True, mode='max')
callbacks_list = [checkpoint]

import tensorflow as tf

from keras.callbacks import ModelCheckpoint

tf.logging.set_verbosity(tf.logging.ERROR)

model.compile(optimizer = 'sgd', loss = 'binary_crossentropy', metrics = ['accuracy'])

filepath="best_model.hdf5"

checkpoint = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, save_best_only=True, mode='max')

callbacks_list = [checkpoint]

I am using batch size of 64 and 10 epochs to train.

model.fit(train_data, train_label, epochs = 10, batch_size = 64, validation_data = (valid_data, valid_label), verbose = 2, callbacks = callbacks_list)

1	model.fit(train_data, train_label, epochs = 10, batch_size = 64, validation_data = (valid_data, valid_label), verbose = 2, callbacks = callbacks_list)

Training and validation accuracy looks quite pleasing. Now let’s calculate screen time of “Ross”.

Calculating Screen Time

To test our trained model and calculate the screen time, I have downloaded another “friends” video clip from YouTube and extracted images. To calculate the screen time, first I have used the trained model to predict each image to find out which class it belongs, either “Ross” or “No Ross”. Since video is made up of 24 frames per second, we will count the number of frames which has been predicted for having “Ross” in it and then divide it by 24 to count the number of seconds “Ross” was on screen.

import os
import numpy as np

ross_images = []
no_ross_images = []

test_path = 'D:/Downloads/youtube/test/data_4/test_images'

for test in tqdm(os.listdir(test_path)):
    test_img = cv2.imread(os.path.join(test_path, test))
    test_img = cv2.resize(test_img, (224,224))
    test_img = test_img/255.
    test_img = np.expand_dims(test_img, 0)
    pred_img = vgg_model.predict(test_img)
    pred_feat = pred_img.reshape(1, 7*7*512)
    out_class = model.predict(pred_feat)
    if out_class < 0.5:
        ross_images.append(out_class)
    else:
        no_ross_images.append(out_class)

import os

import numpy as np

ross_images = []

no_ross_images = []

test_path = 'D:/Downloads/youtube/test/data_4/test_images'

for test in tqdm(os.listdir(test_path)):

test_img = cv2.imread(os.path.join(test_path, test))

test_img = cv2.resize(test_img, (224,224))

test_img = test_img/255.

test_img = np.expand_dims(test_img, 0)

pred_img = vgg_model.predict(test_img)

pred_feat = pred_img.reshape(1, 7*7*512)

out_class = model.predict(pred_feat)

if out_class < 0.5:

ross_images.append(out_class)

else:

no_ross_images.append(out_class)

This test video clip is made up of 24 frames per second and number of images predicted for having “Ross” in it are 4715. So the screen time for Ross will be 4715/24 = 196 seconds.

Summary

We can see good accuracy on train and validation dataset but when I tested it on test dataset, the accuracy was about 65%. The one reason that I figured out is less training data. If you can get more data then accuracy can be higher. Another reason can be co-variance shift which means the test dataset is quite different from training dataset due to different video quality.

This type of technique can be very helpful in calculating screen time of a particular character.

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Multi Input and Multi Output Models in Keras

4 Replies

The Keras functional API is used to define complex models in deep learning . On of its good use case is to use multiple input and output in a model. In this blog we will learn how to define a keras model which takes more than one input and output.

Multi Output Model

Let say you are using MNIST dataset (handwritten digits images) for creating an autoencoder and classification problem both. In that case, you will be having single input but multiple outputs (predicted class and the generated image). Let take a look into the code.

from keras.layers import Dense, Input
from keras.models import Model

# creating model
inputs = Input(shape = (784,))
dense1 = Dense(512, activation = 'relu')(inputs)
dense2 = Dense(128, activation = 'relu')(dense1)
dense3 = Dense(32, activation = 'relu')(dense2)

# create classification output
classification_output = Dense(10, activation = 'softmax')(dense3)

# use output from dense layer 3 to create autoencder output
up_dense1 = Dense(128, activation = 'relu')(dense3)
up_dense2 = Dense(512, activation = 'relu')(up_dense1)
decoded_outputs = Dense(784)(up_dense2)

from keras.layers import Dense, Input

from keras.models import Model

# creating model

inputs = Input(shape = (784,))

dense1 = Dense(512, activation = 'relu')(inputs)

dense2 = Dense(128, activation = 'relu')(dense1)

dense3 = Dense(32, activation = 'relu')(dense2)

# create classification output

classification_output = Dense(10, activation = 'softmax')(dense3)

# use output from dense layer 3 to create autoencder output

up_dense1 = Dense(128, activation = 'relu')(dense3)

up_dense2 = Dense(512, activation = 'relu')(up_dense1)

decoded_outputs = Dense(784)(up_dense2)

In the above code we have used a single input layer and two output layers as ‘classification_output’ and ‘decoder_output’. Let’s see how to create model with these input and outputs.

model = Model(inputs, [classification_output,decoded_outputs])
model.summary()

1 2	model = Model(inputs, [classification_output,decoded_outputs]) model.summary()

Now we have created the model, the next thing is to compile this model. Here we will define two loss functions for both outputs. Also we can assign weights for both losses. See code.

m = 256
n_epoch = 25
model.compile(optimizer='adam', loss=['categorical_crossentropy', 'mse'], loss_weights = [1.0, 0.5], metrics = ['accuracy'])
model.fit(output_X_train,[Y_train, output_X_train], epochs=n_epoch, batch_size=m, shuffle=True)

m = 256

n_epoch = 25

model.compile(optimizer='adam', loss=['categorical_crossentropy', 'mse'], loss_weights = [1.0, 0.5], metrics = ['accuracy'])

model.fit(output_X_train,[Y_train, output_X_train], epochs=n_epoch, batch_size=m, shuffle=True)

Multi Input Model

Let’s take an example where you need to take two inputs: one grayscale image and another RGB image. Using these two images you want to do an image classification. To perform this, we will use Keras functional API. Let’s see code.

# feature extraction from gray scale image
inputs = Input(shape = (28,28,1))

conv1 = Conv2D(16, (3,3), activation = 'relu', padding = "SAME")(inputs)
pool1 = MaxPooling2D(pool_size = (2,2), strides = 2)(conv1)
conv2 = Conv2D(32, (3,3), activation = 'relu', padding = "SAME")(pool1)
pool2 = MaxPooling2D(pool_size = (2,2), strides = 2)(conv2)
flat_1 = Flatten()(pool2)

# feature extraction from RGB image
inputs_2 = Input(shape = (28,28,3))

conv1_2 = Conv2D(16, (3,3), activation = 'relu', padding = "SAME")(inputs_2)
pool1_2 = MaxPooling2D(pool_size = (2,2), strides = 2)(conv1_2)
conv2_2 = Conv2D(32, (3,3), activation = 'relu', padding = "SAME")(pool1_2)
pool2_2 = MaxPooling2D(pool_size = (2,2), strides = 2)(conv2_2)
flat_2 = Flatten()(pool2_2)

# concatenate both feature layers and define output layer after some dense layers
concat = concatenate([flat_1,flat_2])
dense1 = Dense(512, activation = 'relu')(concat)
dense2 = Dense(128, activation = 'relu')(dense1)
dense3 = Dense(32, activation = 'relu')(dense2)
output = Dense(10, activation = 'softmax')(dense3)

# create model with two inputs
model = Model([inputs,inputs_2], dense1)

# feature extraction from gray scale image

inputs = Input(shape = (28,28,1))

conv1 = Conv2D(16, (3,3), activation = 'relu', padding = "SAME")(inputs)

pool1 = MaxPooling2D(pool_size = (2,2), strides = 2)(conv1)

conv2 = Conv2D(32, (3,3), activation = 'relu', padding = "SAME")(pool1)

pool2 = MaxPooling2D(pool_size = (2,2), strides = 2)(conv2)

flat_1 = Flatten()(pool2)

# feature extraction from RGB image

inputs_2 = Input(shape = (28,28,3))

conv1_2 = Conv2D(16, (3,3), activation = 'relu', padding = "SAME")(inputs_2)

pool1_2 = MaxPooling2D(pool_size = (2,2), strides = 2)(conv1_2)

conv2_2 = Conv2D(32, (3,3), activation = 'relu', padding = "SAME")(pool1_2)

pool2_2 = MaxPooling2D(pool_size = (2,2), strides = 2)(conv2_2)

flat_2 = Flatten()(pool2_2)

# concatenate both feature layers and define output layer after some dense layers

concat = concatenate([flat_1,flat_2])

dense1 = Dense(512, activation = 'relu')(concat)

dense2 = Dense(128, activation = 'relu')(dense1)

dense3 = Dense(32, activation = 'relu')(dense2)

output = Dense(10, activation = 'softmax')(dense3)

# create model with two inputs

model = Model([inputs,inputs_2], dense1)

In the above code, we have extracted two different feature layers from both inputs and then concatenated both to create output layer. And created model with two inputs and one output.

A nice example where you can you use both multi input and multi output is capsule network. If you want to take a look into this, refer this blog.

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Dimensionality Reduction for Data Visualization using Autoencoders

Leave a reply

In the previous blog, I have explained concept behind autoencoders and its applications. In this blog we will learn one of the interesting practical application of autoencoders.

Autoencoders are the neural network that are trained to reconstruct their original input. But only reconstructing original input will be useless. The main purpose is to learn interesting features using autoencoders. In this blog we will see how autoencoders can be used to learn interesting features to visualize high dimensional data.

Let say if you are having a 10 dimensional vector, then it will be difficult to visualize it. Then you need to convert it into 2-D or 3-D representation for visualization purpose. There are some famous algorithms like principal component analysis that are used for dimensionality reduction. But if you implement an autoencoder that only uses linear activation function with mean squared error as its loss function, then it will end up performing principal component analysis.

Here we will visualize a 3 dimensional data into 2 dimensional using a simple autoencoder implemented in keras.

Autoencoder model architecture for generating 2-d representation will be as follows:

Input layer with 3 nodes.
1 hidden dense layer with 2 nodes and linear activation.
1 output dense layer with 3 nodes and linear activation.
Loss function is mse and optimizer is adam.

The following code will generate a compressed representation of input data.

from keras.layers import Dense, Input, Activation
from keras.models import Model
import numpy as np

encoder_inputs = Input(shape = (3,))
hidden_layer = Dense(2, activation = 'linear')(encoder_inputs)
output = Dense(3, activation = 'linear')(hidden_layer)
encoder = Model(encoder_inputs,hidden_layer)
model = Model(encoder_inputs, output)

batch_size = 64
n_epoch = 100
model.compile(optimizer='adam', loss='mse')
model.fit(data,data, epochs=n_epoch, batch_size=batch_size, shuffle=True)

from keras.layers import Dense, Input, Activation

from keras.models import Model

import numpy as np

encoder_inputs = Input(shape = (3,))

hidden_layer = Dense(2, activation = 'linear')(encoder_inputs)

output = Dense(3, activation = 'linear')(hidden_layer)

encoder = Model(encoder_inputs,hidden_layer)

model = Model(encoder_inputs, output)

batch_size = 64

n_epoch = 100

model.compile(optimizer='adam', loss='mse')

model.fit(data,data, epochs=n_epoch, batch_size=batch_size, shuffle=True)

Here is the generated 2-D representation of input 3-D data.

In the similar way you can visualize high dimensional data into 2-Dimensional or 3-Dimensional vectors.

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Implementing Capsule Network in Keras

14 Replies

In the last blog we have seen that what is a capsule network and how it can overcome the problems associated with convolutional neural network. In this blog we will implement a capsule network in keras.

You can find full code here.

Here, we will use handwritten digit dataset(MNIST) and train the capsule network to classify the digits. MNIST digit dataset consists of grayscale images of size 28*28.

Capsule Network architecture is somewhat similar to convolutional neural network except capsule layers. We can break the implementation of capsule network into following steps:

Initial convolutional layer
Primary capsule layer
Digit capsule layer
Decoder network
Loss Functions
Training and testing of model

Initial Convolution Layer:

Initially we will use a convolution layer to detect low level features of an image. It will use 256 filters each of size 9*9 with stride 1 and activation function is relu. Input size of image is 28*28, after applying this layer output size will be 20*20*256.

input_shape = Input(shape=(28,28,1))  # size of input image is 28*28

# a convolution layer output shape = 20*20*256
conv1 = Conv2D(256, (9,9), activation = 'relu', padding = 'valid')(input_shape)

input_shape = Input(shape=(28,28,1)) # size of input image is 28*28

# a convolution layer output shape = 20*20*256

conv1 = Conv2D(256, (9,9), activation = 'relu', padding = 'valid')(input_shape)

Primary Capsule Layer:

The output from the previous layer is being passed to 256 filters each of size 9*9 with a stride of 2 which will produce an output of size 6*6*256. This output is then reshaped into 8-dimensional vector. So shape will be 6*6*32 capsules each of which will be 8-dimensional. Then it will pass through a non-linear function(squash) so that length of output vector can be maintained between 0 and 1.

# convolution layer with stride 2 and 256 filters of size 9*9
conv2 = Conv2D(256, (9,9), strides = 2, padding = 'valid')(conv1)

# reshape into 1152 capsules of 8 dimensional vectors
reshaped = Reshape((6*6*32,8))(conv2)

# squash the reshaped output to make length of vector b/w 0 and 1
squashed_output = Lambda(squash)(reshaped)

def squash(inputs):
    # take norm of input vectors
    squared_norm = K.sum(K.square(inputs), axis = -1, keepdims = True)

    # use the formula for non-linear function to return squashed output
    return ((squared_norm/(1+squared_norm))/(K.sqrt(squared_norm+K.epsilon())))*inputs

# convolution layer with stride 2 and 256 filters of size 9*9

conv2 = Conv2D(256, (9,9), strides = 2, padding = 'valid')(conv1)

# reshape into 1152 capsules of 8 dimensional vectors

reshaped = Reshape((6*6*32,8))(conv2)

# squash the reshaped output to make length of vector b/w 0 and 1

squashed_output = Lambda(squash)(reshaped)

def squash(inputs):

# take norm of input vectors

squared_norm = K.sum(K.square(inputs), axis = -1, keepdims = True)

# use the formula for non-linear function to return squashed output

return ((squared_norm/(1+squared_norm))/(K.sqrt(squared_norm+K.epsilon())))*inputs

Digit Capsule Layer:

Logic and algorithm used for this layer is explained in the previous blog. Here we will see what we need to do in code to implement it. We need to write a custom layer in keras. It will take 1152*8 as its input and produces output of size 10*16, where 10 capsules each represents an output class with 16 dimensional vector. Then each of these 10 capsules are converted into single value to predict the output class using a lambda layer.

class DigitCapsuleLayer(Layer):
    # creating a layer class in keras
    def __init__(self, **kwargs):
        super(DigitCapsuleLayer, self).__init__(**kwargs)
        self.kernel_initializer = initializers.get('glorot_uniform')
    
    def build(self, input_shape): 
        # initialize weight matrix for each capsule in lower layer
        self.W = self.add_weight(shape = [10, 6*6*32, 16, 8], initializer = self.kernel_initializer, name = 'weights')
        self.built = True
    
    def call(self, inputs):
        inputs = K.expand_dims(inputs, 1)
        inputs = K.tile(inputs, [1, 10, 1, 1])
        # matrix multiplication b/w previous layer output and weight matrix
        inputs = K.map_fn(lambda x: K.batch_dot(x, self.W, [2, 3]), elems=inputs)
        b = tf.zeros(shape = [K.shape(inputs)[0], 10, 6*6*32])
        
# routing algorithm with updating coupling coefficient c, using scalar product b/w input capsule and output capsule
        for i in range(3-1):
            c = tf.nn.softmax(b, dim=1)
            s = K.batch_dot(c, inputs, [2, 2])
            v = squash(s)
            b = b + K.batch_dot(v, inputs, [2,3])
            
        return v 
    def compute_output_shape(self, input_shape):
        return tuple([None, 10, 16])

def output_layer(inputs):
    return K.sqrt(K.sum(K.square(inputs), -1) + K.epsilon())

digit_caps = DigitCapsuleLayer()(squashed_output)
outputs = Lambda(output_layer)(digit_caps)

class DigitCapsuleLayer(Layer):

# creating a layer class in keras

def __init__(self, **kwargs):

super(DigitCapsuleLayer, self).__init__(**kwargs)

self.kernel_initializer = initializers.get('glorot_uniform')

def build(self, input_shape):

# initialize weight matrix for each capsule in lower layer

self.W = self.add_weight(shape = [10, 6*6*32, 16, 8], initializer = self.kernel_initializer, name = 'weights')

self.built = True

def call(self, inputs):

inputs = K.expand_dims(inputs, 1)

inputs = K.tile(inputs, [1, 10, 1, 1])

# matrix multiplication b/w previous layer output and weight matrix

inputs = K.map_fn(lambda x: K.batch_dot(x, self.W, [2, 3]), elems=inputs)

b = tf.zeros(shape = [K.shape(inputs)[0], 10, 6*6*32])

# routing algorithm with updating coupling coefficient c, using scalar product b/w input capsule and output capsule

for i in range(3-1):

c = tf.nn.softmax(b, dim=1)

s = K.batch_dot(c, inputs, [2, 2])

v = squash(s)

b = b + K.batch_dot(v, inputs, [2,3])

return v

def compute_output_shape(self, input_shape):

return tuple([None, 10, 16])

def output_layer(inputs):

return K.sqrt(K.sum(K.square(inputs), -1) + K.epsilon())

digit_caps = DigitCapsuleLayer()(squashed_output)

outputs = Lambda(output_layer)(digit_caps)

Decoder Network:

To further boost the pose parameters learned by the digit capsule layer, we can add decoder network to reconstruct the input image. In this part, decoder network will be fed with an input of size 10*16 (digit capsule layer output) and will reconstruct back the original image of size 28*28. Decoder will consist of 3 dense layer having 512, 1024 and 784 nodes.

During training time input to the decoder is the output from digit capsule layer which is masked with original labels. It means that other vectors except the vector corresponding to correct label will be multiplied with zero. So that decoder can only be trained with correct digit capsule. In test time input to decoder will be the same output from digit capsule layer but masked with highest length vector in that layer. Lets see the code.

def mask(outputs):

    if type(outputs) != list:  # mask at test time
        norm_outputs = K.sqrt(K.sum(K.square(outputs), -1) + K.epsilon())
        y  = K.one_hot(indices=K.argmax(norm_outputs, 1), num_classes = 10)
        y = Reshape((10,1))(y)
        return Flatten()(y*outputs)

    else:    # mask at train time
        y = Reshape((10,1))(outputs[1])
        masked_output = y*outputs[0]
        return Flatten()(masked_output)

inputs = Input(shape = (10,))
masked = Lambda(mask)([digit_caps, inputs])
masked_for_test = Lambda(mask)(digit_caps)

decoded_inputs = Input(shape = (16*10,))
dense1 = Dense(512, activation = 'relu')(decoded_inputs)
dense2 = Dense(1024, activation = 'relu')(dense1)
decoded_outputs = Dense(784, activation = 'sigmoid')(dense2)
decoded_outputs = Reshape((28,28,1))(decoded_outputs)

def mask(outputs):

if type(outputs) != list: # mask at test time

norm_outputs = K.sqrt(K.sum(K.square(outputs), -1) + K.epsilon())

y = K.one_hot(indices=K.argmax(norm_outputs, 1), num_classes = 10)

y = Reshape((10,1))(y)

return Flatten()(y*outputs)

else: # mask at train time

y = Reshape((10,1))(outputs[1])

masked_output = y*outputs[0]

return Flatten()(masked_output)

inputs = Input(shape = (10,))

masked = Lambda(mask)([digit_caps, inputs])

masked_for_test = Lambda(mask)(digit_caps)

decoded_inputs = Input(shape = (16*10,))

dense1 = Dense(512, activation = 'relu')(decoded_inputs)

dense2 = Dense(1024, activation = 'relu')(dense1)

decoded_outputs = Dense(784, activation = 'sigmoid')(dense2)

decoded_outputs = Reshape((28,28,1))(decoded_outputs)

Loss Functions:

It uses two loss function one is probabilistic loss function used for classifying digits image and another is reconstruction loss which is mean squared error. Lets see probabilistic loss which is simple to understand once you look at following code.

def loss_fn(y_true, y_pred):

    L = y_true * K.square(K.maximum(0., 0.9 - y_pred)) + 0.5 * (1 - y_true) * K.square(K.maximum(0., y_pred - 0.1))

    return K.mean(K.sum(L, 1))

def loss_fn(y_true, y_pred):

L = y_true * K.square(K.maximum(0., 0.9 - y_pred)) + 0.5 * (1 - y_true) * K.square(K.maximum(0., y_pred - 0.1))

return K.mean(K.sum(L, 1))

Training and Testing of model:

Now define our training and testing model and train it on MNIST digit dataset.

decoder = Model(decoded_inputs, decoded_outputs)
model = Model([input_shape,inputs],[outputs,decoder(masked)])
test_model = Model(input_shape,[outputs,decoder(masked_for_test)])

m = 128
epochs = 10
model.compile(optimizer=keras.optimizers.Adam(lr=0.001),loss=[loss_fn,'mse'],loss_weights = [1. ,0.0005],metrics=['accuracy'])
model.fit([x_train, y_train],[y_train,x_train], batch_size = m, epochs = epochs, validation_data = ([x_test, y_test],[y_test,x_test]))

decoder = Model(decoded_inputs, decoded_outputs)

model = Model([input_shape,inputs],[outputs,decoder(masked)])

test_model = Model(input_shape,[outputs,decoder(masked_for_test)])

m = 128

epochs = 10

model.compile(optimizer=keras.optimizers.Adam(lr=0.001),loss=[loss_fn,'mse'],loss_weights = [1. ,0.0005],metrics=['accuracy'])

model.fit([x_train, y_train],[y_train,x_train], batch_size = m, epochs = epochs, validation_data = ([x_test, y_test],[y_test,x_test]))

In test data set it was able to achieve 99.09% accuracy. Pretty good yeah! Also reconstructed images looks good. Here are the reconstructed images generated by decoder network.

Capsule Network comes with promising results and yet to be explored thoroughly. There are various bits and bytes where it can be explored. Research on a capsule network is still in an early stage but it has given clear indication that it is worth exploring.

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Feeding output of a given intermediate layer in Keras as the input to another network

Leave a reply

Keras is a high level neural network library used for fast experimentation, user friendliness and easy extensibility. It is highly recommended library for a beginner in neural networks. In this blog we will learn how to use an intermediate layer of a neural network as input to another network.

Sometimes you might get stuck while using an output of an intermediate layer with the errors like ‘graph disconnected‘. Lets see how we can solve this through the code.

First, Lets create an autoencoder model. If you are not aware of what is an autoencoder, you can follow this blog.

# creating autoencoder model
encoder_inputs = Input(shape = (28,28,1))

conv1 = Conv2D(16, (3,3), activation = 'relu', padding = "SAME")(encoder_inputs)
pool1 = MaxPooling2D(pool_size = (2,2), strides = 2)(conv1)
conv2 = Conv2D(32, (3,3), activation = 'relu', padding = "SAME")(pool1)
pool2 = MaxPooling2D(pool_size = (2,2), strides = 2)(conv2)
flat = Flatten()(pool2)
encoder_outputs = Dense(32, activation = 'relu')(flat)

dense_layer_d = Dense(7*7*32, activation = 'relu')(encoder_outputs)
output_from_d = Reshape((7,7,32))(dense_layer_d)
conv1_1 = Conv2D(32, (3,3), activation = 'relu', padding = "SAME")(output_from_d)
upsampling_1 = Conv2DTranspose(32, 3, padding='same', activation='relu', strides=(2, 2))(conv1_1)
upsampling_2 = Conv2DTranspose(16, 3, padding='same', activation='relu', strides=(2, 2))(upsampling_1)
decoded_outputs = Conv2DTranspose(1, 3, padding='same', activation='relu')(upsampling_2)

autoencoder = Model(encoder_inputs, decoded_outputs)

# creating autoencoder model

encoder_inputs = Input(shape = (28,28,1))

conv1 = Conv2D(16, (3,3), activation = 'relu', padding = "SAME")(encoder_inputs)

pool1 = MaxPooling2D(pool_size = (2,2), strides = 2)(conv1)

conv2 = Conv2D(32, (3,3), activation = 'relu', padding = "SAME")(pool1)

pool2 = MaxPooling2D(pool_size = (2,2), strides = 2)(conv2)

flat = Flatten()(pool2)

encoder_outputs = Dense(32, activation = 'relu')(flat)

dense_layer_d = Dense(7*7*32, activation = 'relu')(encoder_outputs)

output_from_d = Reshape((7,7,32))(dense_layer_d)

conv1_1 = Conv2D(32, (3,3), activation = 'relu', padding = "SAME")(output_from_d)

upsampling_1 = Conv2DTranspose(32, 3, padding='same', activation='relu', strides=(2, 2))(conv1_1)

upsampling_2 = Conv2DTranspose(16, 3, padding='same', activation='relu', strides=(2, 2))(upsampling_1)

decoded_outputs = Conv2DTranspose(1, 3, padding='same', activation='relu')(upsampling_2)

autoencoder = Model(encoder_inputs, decoded_outputs)

In the above code we have created an autoencoder model. At line 9, we have generated encoder outputs. Now if you want to create decoder network from this model with encoder_outputs layer as it input, what should you do? A beginner will do something like this:

decoder_network = Model(dense_layer_d, decoded_outputs)

1	decoder_network = Model(dense_layer_d, decoded_outputs)

But this will throw an error ‘graph disconnected’. This is because dense_layer_d layer is connected to another previous layer and you have disconnected it to directly take this layer as input. To solve this problem you can do something like this:

decoder_input = Input(shape = (32,))
next_layer = decoder_input
for layer in autoencoder.layers[-6:]:
    next_layer = layer(next_layer)

decoder = Model(decoder_input, next_layer)

decoder_input = Input(shape = (32,))

next_layer = decoder_input

for layer in autoencoder.layers[-6:]:

next_layer = layer(next_layer)

decoder = Model(decoder_input, next_layer)

Earlier we have created a model autoencoder. Now if you want to get its intermediate layer, use following steps:

Find index of the input layer to decoder( in the given autoencoder model it is the 6th layer from last so -6)
Use autoencoder.layers to get that layer.
Iterate through the following layers in the autoencoder model, till the decoder_output layer.
Then create model using decoder_input and last iterated layer.

This will successfully create a decoder model which will take the output of an intermediate layer ‘encoder_outputs’ as its input. And that’s it!!

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Custom Layers in Keras

Leave a reply

A model in Keras is composed of layers. There are in-built layers present in Keras which you can directly import like Conv2D, Pool, Flatten, Reshape, etc. But sometimes you need to add your own custom layer. In this blog, we will learn how to add a custom layer in Keras.

There are basically two types of custom layers that you can add in Keras.

Lambda Layer

Lambda layer is useful whenever you need to do some operation on previous layer and do not want to add any trainable weights to it.

Let say you want to add your own activation function (which is not built-in Keras) to a layer. Then you first need to define a function which will take the output from the previous layer as input and apply custom activation function to it. We then pass this function to lambda layer.

from keras.layers import Lambda
from keras import backend as K

# defining a custom non linear function
def activation_relu(inputs):
    return K.maximum(0.,inputs)

# call function using lambda layer
squashed_output = Lambda(activation_relu)(inputs) # where inputs are output from previous layer

from keras.layers import Lambda

from keras import backend as K

# defining a custom non linear function

def activation_relu(inputs):

return K.maximum(0.,inputs)

# call function using lambda layer

squashed_output = Lambda(activation_relu)(inputs) # where inputs are output from previous layer

Custom Class Layer

Sometimes you want to create your own layer with trainable weights which is not in-built in Keras. In that case you need to create a custom class layer where you need to define following methods.

__init__ method to initialize class variable and super class variables
build method to define weights.
call method where you will perform all your operations.
compute_output_shape method to define output shape of this custom layer

Lets see an example of a custom layer class. Here you only need to focus on the architecture of the class.


     class MyLayer(Layer):

    def __init__(self, output_dim, **kwargs):
        self.output_dim = output_dim
        super(MyLayer, self).__init__(**kwargs)

    def build(self, input_shape):
        # Create a trainable weight variable for this layer.
        self.W= self.add_weight(name='kernel', 
                                      shape=(input_shape[1], self.output_dim),
                                      initializer='uniform',
                                      trainable=True)
        self.built = True 

    def call(self, x):
        return K.dot(x, self.W)

    def compute_output_shape(self, input_shape):
        return (input_shape[0], self.output_dim)

class MyLayer(Layer):

def __init__(self, output_dim, **kwargs):

self.output_dim = output_dim

super(MyLayer, self).__init__(**kwargs)

def build(self, input_shape):

# Create a trainable weight variable for this layer.

self.W= self.add_weight(name='kernel',

shape=(input_shape[1], self.output_dim),

initializer='uniform',

trainable=True)

self.built = True

def call(self, x):

return K.dot(x, self.W)

def compute_output_shape(self, input_shape):

return (input_shape[0], self.output_dim)

In the build method defining self.built = True is necessary. Also, you can see that all logic is written inside call(self, inputs) method. comput_output_shape will define the output shape of the layer.

You can also pass multiple input tensor to this custom layer. The only thing you need to do is, pass multiple inputs using a list.

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Saving and Loading models in Keras

1 Reply

Generally, a deep learning model takes a large amount of time to train, so its better to know how to save trained model. In this blog we will learn about how to save whole keras model i.e. its architecture, weights and optimizer state.

Lets first create a model in Keras. This is a simple autoencoder model. If you need to know more about autoencoders please refer this blog.

from keras.models import Model
from keras.layers import Dense, Input
import matplotlib.pyplot as plt
import numpy as np

# creating an autoencoder model
input_shape = Input(shape = (784,))
dense1 = Dense(512, activation = 'relu')(input_shape)
dense2 = Dense(256, activation = 'relu')(dense1)
encoded = Dense(32, activation = 'relu')(dense2)

dense3 = Dense(256, activation = 'relu')(encoded)
dense4 = Dense(512, activation = 'relu')(dense3)
decoded = Dense(784, activation = 'relu')(dense4)

autoencoder = Model(input_shape, decoded)

m = 256
n_epoch = 25
autoencoder.compile(optimizer='adam', loss='mse')
autoencoder.fit(output_X_train,output_X_train, epochs=n_epoch, batch_size=m, shuffle=True)

from keras.models import Model

from keras.layers import Dense, Input

import matplotlib.pyplot as plt

import numpy as np

# creating an autoencoder model

input_shape = Input(shape = (784,))

dense1 = Dense(512, activation = 'relu')(input_shape)

dense2 = Dense(256, activation = 'relu')(dense1)

encoded = Dense(32, activation = 'relu')(dense2)

dense3 = Dense(256, activation = 'relu')(encoded)

dense4 = Dense(512, activation = 'relu')(dense3)

decoded = Dense(784, activation = 'relu')(dense4)

autoencoder = Model(input_shape, decoded)

m = 256

n_epoch = 25

autoencoder.compile(optimizer='adam', loss='mse')

autoencoder.fit(output_X_train,output_X_train, epochs=n_epoch, batch_size=m, shuffle=True)

Above we have created a Keras model named as “autoencoder“. Now lets see how to save this model.

Saving and loading only architecture of a model

In keras, you can save and load architecture of a model in two formats: JSON or YAML Models generated in these two format are human readable and can be edited if needed.

# saving and loading model architecture in json format

# saving in json format
json_model = autoencoder.to_json()
json_file = open('autoencoder_json.json', 'w')
json_file.write(json_model)

# loading model architecture from json file
from keras.models import model_from_json
json_file = open('autoencoder_json.json', 'r')
json_model = model_from_json(json_file.read())

# saving and loading model architecture in json format

# saving in json format

json_model = autoencoder.to_json()

json_file = open('autoencoder_json.json', 'w')

json_file.write(json_model)

# loading model architecture from json file

from keras.models import model_from_json

json_file = open('autoencoder_json.json', 'r')

json_model = model_from_json(json_file.read())

# saving and loading model architecture in yaml format

# saving in yaml format
yaml_model = autoencoder.to_yaml()
yaml_file = open('autoencoder_yaml.yaml', 'w')
yaml_file.write(yaml_model)

# loading model architecture from yaml file
from keras.models import model_from_yaml
yaml_file = open('autoencoder_yaml.yaml', 'r')
yaml_model = model_from_yaml(yaml_file.read())

# saving and loading model architecture in yaml format

# saving in yaml format

yaml_model = autoencoder.to_yaml()

yaml_file = open('autoencoder_yaml.yaml', 'w')

yaml_file.write(yaml_model)

# loading model architecture from yaml file

from keras.models import model_from_yaml

yaml_file = open('autoencoder_yaml.yaml', 'r')

yaml_model = model_from_yaml(yaml_file.read())

Saving and Loading Weights of a Keras Model

With model architecture you will also need model weights to predict output from trained model.

# saving model weights
autoencoder.save_weights('autoencoder_weights.h5')

# loading weights of a keras model
json_model.load_weights('autoencoder_weights.h5')

# saving model weights

autoencoder.save_weights('autoencoder_weights.h5')

# loading weights of a keras model

json_model.load_weights('autoencoder_weights.h5')

Saving and Loading Both Architecture and Weights in one File

# saving whole model
autoencoder.save('autoencoder_model.h5')

# loading whole model
from keras.models import load_model
model1 = load_model('autoencoder_model.h5')

# saving whole model

autoencoder.save('autoencoder_model.h5')

# loading whole model

from keras.models import load_model

model1 = load_model('autoencoder_model.h5')

This will save following four parameters in “autoencoder_model.h5” file:

Model Architecture
Model Weights
Loss and Optimizer
State of the optimizer allowing to resume training where you left.

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Compression of data using Autoencoders

Leave a reply

In the last blog, we discussed what autoencoders are. In this blog, we will learn, how autoencoders can be used to compress data and reconstruct back the original data.

Here I have used MNIST dataset. First, I have downloaded MNIST dataset which is having digits images(0 to 9), a total of size 45 MB. Let’s, see the code to download data using python.

# download training and test data from mnist and reshape it
from keras.datasets import mnist
(X_train, _), (_, _) = mnist.load_data()
X_train = X_train.astype('float32') / 255.
output_X_train = X_train.reshape(-1,28,28,1)

# download training and test data from mnist and reshape it

from keras.datasets import mnist

(X_train, _), (_, _) = mnist.load_data()

X_train = X_train.astype('float32') / 255.

output_X_train = X_train.reshape(-1,28,28,1)

Since we want to compress the dataset and reconstruct back it into original data, first we have to create a convolutional autoencoder. Let’s see code:

# creating autoencoder model
encoder_inputs = Input(shape = (28,28,1))

conv1 = Conv2D(16, (3,3), activation = 'relu', padding = "SAME")(encoder_inputs)
pool1 = MaxPooling2D(pool_size = (2,2), strides = 2)(conv1)
conv2 = Conv2D(32, (3,3), activation = 'relu', padding = "SAME")(pool1)
pool2 = MaxPooling2D(pool_size = (2,2), strides = 2)(conv2)
flat = Flatten()(pool2)

enocoder_outputs = Dense(32, activation = 'relu')(flat)
#upsampling in decoder

dense_layer_d = Dense(7*7*32, activation = 'relu')(enocoder_outputs)
output_from_d = Reshape((7,7,32))(dense_layer_d)
conv1_1 = Conv2D(32, (3,3), activation = 'relu', padding = "SAME")(output_from_d)
upsampling_1 = Conv2DTranspose(32, 3, padding='same', activation='relu', strides=(2, 2))(conv1_1)
upsampling_2 = Conv2DTranspose(16, 3, padding='same', activation='relu', strides=(2, 2))(upsampling_1)
decoded_outputs = Conv2DTranspose(1, 3, padding='same', activation='relu')(upsampling_2)

autoencoder = Model(encoder_inputs, decoded_outputs)

# creating autoencoder model

encoder_inputs = Input(shape = (28,28,1))

conv1 = Conv2D(16, (3,3), activation = 'relu', padding = "SAME")(encoder_inputs)

pool1 = MaxPooling2D(pool_size = (2,2), strides = 2)(conv1)

conv2 = Conv2D(32, (3,3), activation = 'relu', padding = "SAME")(pool1)

pool2 = MaxPooling2D(pool_size = (2,2), strides = 2)(conv2)

flat = Flatten()(pool2)

enocoder_outputs = Dense(32, activation = 'relu')(flat)

#upsampling in decoder

dense_layer_d = Dense(7*7*32, activation = 'relu')(enocoder_outputs)

output_from_d = Reshape((7,7,32))(dense_layer_d)

conv1_1 = Conv2D(32, (3,3), activation = 'relu', padding = "SAME")(output_from_d)

upsampling_1 = Conv2DTranspose(32, 3, padding='same', activation='relu', strides=(2, 2))(conv1_1)

upsampling_2 = Conv2DTranspose(16, 3, padding='same', activation='relu', strides=(2, 2))(upsampling_1)

decoded_outputs = Conv2DTranspose(1, 3, padding='same', activation='relu')(upsampling_2)

autoencoder = Model(encoder_inputs, decoded_outputs)

From this autoencoder model, I have created encoder and decoder model. Encoder model will compress the data and decoder model will be used while reconstructing original data. Then trained the auotoencoder model.

decoder_input = Input(shape = (32,))
next_layer = decoder_input
for layer in autoencoder.layers[-6:]:  # to get input layer for decoder
    next_layer = layer(next_layer)

decoder = Model(decoder_input, next_layer)

encoder = Model(encoder_inputs, enocoder_outputs)

m = 256 # batch size
n_epoch = 100
autoencoder.compile(optimizer='adam', loss='mse')
autoencoder.fit(output_X_train,output_X_train, epochs=n_epoch, batch_size=m, shuffle=True)

decoder_input = Input(shape = (32,))

next_layer = decoder_input

for layer in autoencoder.layers[-6:]: # to get input layer for decoder

next_layer = layer(next_layer)

decoder = Model(decoder_input, next_layer)

encoder = Model(encoder_inputs, enocoder_outputs)

m = 256 # batch size

n_epoch = 100

autoencoder.compile(optimizer='adam', loss='mse')

autoencoder.fit(output_X_train,output_X_train, epochs=n_epoch, batch_size=m, shuffle=True)

Using encoder model we can save compressed data into a text file. Which having size of 18 MB( Much less then original size 45 MB).

encoded = encoder.predict(output_X_train)
with open('compressed_data.txt', 'w') as data_file:
    for data in encoded:
        for each_data in data:
            data_file.write(str(each_data))
            data_file.write('\n')

encoded = encoder.predict(output_X_train)

with open('compressed_data.txt', 'w') as data_file:

for data in encoded:

for each_data in data:

data_file.write(str(each_data))

data_file.write('\n')

Now next thing is how we can reconstruct this compressed data when original data is needed. The simple solution is, we can save our decoder model and its weight which will be used further to reconstruct this compressed data. Let’s save decoder model and it’s weights.

decoder.save_weights('decoder.h5')
decoder_json = decoder.to_json()
with open('decoder.json', 'w') as json_file:
    json_file.write(decoder_json)

decoder.save_weights('decoder.h5')

decoder_json = decoder.to_json()

with open('decoder.json', 'w') as json_file:

json_file.write(decoder_json)

Finally we are having our compressed data and decoder model. Let’s see code how we can simply reconstruct back using these two.

# reading compressed data
with open('compressed_data.txt') as data_file:
    data = data_file.readlines()

compressed_data = [float(x.strip()) for x in data]
compressed_data= [compressed_data[i:i+32] for i in range(0, len(compressed_data), 32)] 

# load decoder model and its weights
json_file = open('decoder.json', 'r')
loaded_json_model = json_file.read()
decoder = model_from_json(loaded_json_model)
decoder.load_weights('decoder.h5')

decoded_imgs  = decoder.predict(np.array(compressed_data))

# reading compressed data

with open('compressed_data.txt') as data_file:

data = data_file.readlines()

compressed_data = [float(x.strip()) for x in data]

compressed_data= [compressed_data[i:i+32] for i in range(0, len(compressed_data), 32)]

# load decoder model and its weights

json_file = open('decoder.json', 'r')

loaded_json_model = json_file.read()

decoder = model_from_json(loaded_json_model)

decoder.load_weights('decoder.h5')

decoded_imgs = decoder.predict(np.array(compressed_data))

Above are our output from decoder model.

It looks fascinating to compress data to less size and get same data back when we need, but there are some real problem with this method.

The problem is autoencoders can not generalize. Autoencoders can only reconstruct images for which these are trained. But with the advancement in deep learning those days are not far away when you will use this type compression using deep learning.

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

Sparse Autoencoders

1 Reply

In the last blog we have seen autoencoders and its applications. In this blog we will learn one of its variant, sparse autoencoders.

In every autoencoder, we try to learn compressed representation of the input. Let’s take an example of a simple autoencoder having input vector dimension of 1000, compressed into 500 hidden units and reconstructed back into 1000 outputs. The hidden units will learn correlated features present in the input. But what if input features are completely random? Then it will we difficult for hidden units to learn interesting structure present in data. In that situation what we can do is increase the number of hidden units and add some sparsity constraints. Now the question is what are sparsity constraints?

When sparsity constraints added to a hidden unit, it only activates some units (having large activation values) and makes rest to zero. So, even if we are having a large number of hidden units( as in the above example), it will only fire some hidden units and learn useful structure present in the data.

The simplest implementation of sparsity constraints can be done in keras. You can simple add activity_regularizer to a layer (see line 11) and it will do the rest.

from keras import regularizers
# creating autoencoder model
encoder_inputs = Input(shape = (28,28,1))

conv1 = Conv2D(16, (3,3), activation = 'relu', padding = "SAME")(encoder_inputs)
pool1 = MaxPooling2D(pool_size = (2,2), strides = 2)(conv1)
conv2 = Conv2D(32, (3,3), activation = 'relu', padding = "SAME")(pool1)
pool2 = MaxPooling2D(pool_size = (2,2), strides = 2)(conv2)
flat = Flatten()(pool2)

enocder_outputs = Dense(32, activation = 'relu', activity_regularizer=regularizers.l1(10e-5))(flat)

#upsampling in decoder

dense_layer_d = Dense(7*7*32, activation = 'relu')(enocder_outputs)
output_from_d = Reshape((7,7,32))(dense_layer_d)
conv1_1 = Conv2D(32, (3,3), activation = 'relu', padding = "SAME")(output_from_d)
upsampling_1 = Conv2DTranspose(32, 3, padding='same', activation='relu', strides=(2, 2))(conv1_1)
upsampling_2 = Conv2DTranspose(16, 3, padding='same', activation='relu', strides=(2, 2))(upsampling_1)
decoded_outputs = Conv2DTranspose(1, 3, padding='same', activation='relu')(upsampling_2)

autoencoder = Model(encoder_inputs, decoded_outputs)

from keras import regularizers

# creating autoencoder model

encoder_inputs = Input(shape = (28,28,1))

conv1 = Conv2D(16, (3,3), activation = 'relu', padding = "SAME")(encoder_inputs)

pool1 = MaxPooling2D(pool_size = (2,2), strides = 2)(conv1)

conv2 = Conv2D(32, (3,3), activation = 'relu', padding = "SAME")(pool1)

pool2 = MaxPooling2D(pool_size = (2,2), strides = 2)(conv2)

flat = Flatten()(pool2)

enocder_outputs = Dense(32, activation = 'relu', activity_regularizer=regularizers.l1(10e-5))(flat)

#upsampling in decoder

dense_layer_d = Dense(7*7*32, activation = 'relu')(enocder_outputs)

output_from_d = Reshape((7,7,32))(dense_layer_d)

conv1_1 = Conv2D(32, (3,3), activation = 'relu', padding = "SAME")(output_from_d)

upsampling_1 = Conv2DTranspose(32, 3, padding='same', activation='relu', strides=(2, 2))(conv1_1)

upsampling_2 = Conv2DTranspose(16, 3, padding='same', activation='relu', strides=(2, 2))(upsampling_1)

decoded_outputs = Conv2DTranspose(1, 3, padding='same', activation='relu')(upsampling_2)

autoencoder = Model(encoder_inputs, decoded_outputs)

But, if you want to add sparse constraints by writing your own function, you can follow reference given below.

References: Sparse Autoencoders

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

TheAILearner

Mastering Artificial Intelligence

Tag Archives: keras

Calculating Screen Time of an Actor using Deep Learning

Creating Dataset

Input Data and Preprocessing

Transfer Learning

Network Architectures

Splitting Data into Train and Validation

Training the Network

Calculating Screen Time

Summary

Multi Input and Multi Output Models in Keras

Multi Output Model

Multi Input Model

Dimensionality Reduction for Data Visualization using Autoencoders

Implementing Capsule Network in Keras

Feeding output of a given intermediate layer in Keras as the input to another network

Custom Layers in Keras

Lambda Layer

Custom Class Layer

Saving and Loading models in Keras

Compression of data using Autoencoders

Sparse Autoencoders