compression of data | TheAILearner

In the last blog, we discussed what autoencoders are. In this blog, we will learn, how autoencoders can be used to compress data and reconstruct back the original data.

Here I have used MNIST dataset. First, I have downloaded MNIST dataset which is having digits images(0 to 9), a total of size 45 MB. Let’s, see the code to download data using python.

# download training and test data from mnist and reshape it
from keras.datasets import mnist
(X_train, _), (_, _) = mnist.load_data()
X_train = X_train.astype('float32') / 255.
output_X_train = X_train.reshape(-1,28,28,1)

# download training and test data from mnist and reshape it

from keras.datasets import mnist

(X_train, _), (_, _) = mnist.load_data()

X_train = X_train.astype('float32') / 255.

output_X_train = X_train.reshape(-1,28,28,1)

Since we want to compress the dataset and reconstruct back it into original data, first we have to create a convolutional autoencoder. Let’s see code:

# creating autoencoder model
encoder_inputs = Input(shape = (28,28,1))

conv1 = Conv2D(16, (3,3), activation = 'relu', padding = "SAME")(encoder_inputs)
pool1 = MaxPooling2D(pool_size = (2,2), strides = 2)(conv1)
conv2 = Conv2D(32, (3,3), activation = 'relu', padding = "SAME")(pool1)
pool2 = MaxPooling2D(pool_size = (2,2), strides = 2)(conv2)
flat = Flatten()(pool2)

enocoder_outputs = Dense(32, activation = 'relu')(flat)
#upsampling in decoder

dense_layer_d = Dense(7*7*32, activation = 'relu')(enocoder_outputs)
output_from_d = Reshape((7,7,32))(dense_layer_d)
conv1_1 = Conv2D(32, (3,3), activation = 'relu', padding = "SAME")(output_from_d)
upsampling_1 = Conv2DTranspose(32, 3, padding='same', activation='relu', strides=(2, 2))(conv1_1)
upsampling_2 = Conv2DTranspose(16, 3, padding='same', activation='relu', strides=(2, 2))(upsampling_1)
decoded_outputs = Conv2DTranspose(1, 3, padding='same', activation='relu')(upsampling_2)

autoencoder = Model(encoder_inputs, decoded_outputs)

# creating autoencoder model

encoder_inputs = Input(shape = (28,28,1))

conv1 = Conv2D(16, (3,3), activation = 'relu', padding = "SAME")(encoder_inputs)

pool1 = MaxPooling2D(pool_size = (2,2), strides = 2)(conv1)

conv2 = Conv2D(32, (3,3), activation = 'relu', padding = "SAME")(pool1)

pool2 = MaxPooling2D(pool_size = (2,2), strides = 2)(conv2)

flat = Flatten()(pool2)

enocoder_outputs = Dense(32, activation = 'relu')(flat)

#upsampling in decoder

dense_layer_d = Dense(7*7*32, activation = 'relu')(enocoder_outputs)

output_from_d = Reshape((7,7,32))(dense_layer_d)

conv1_1 = Conv2D(32, (3,3), activation = 'relu', padding = "SAME")(output_from_d)

upsampling_1 = Conv2DTranspose(32, 3, padding='same', activation='relu', strides=(2, 2))(conv1_1)

upsampling_2 = Conv2DTranspose(16, 3, padding='same', activation='relu', strides=(2, 2))(upsampling_1)

decoded_outputs = Conv2DTranspose(1, 3, padding='same', activation='relu')(upsampling_2)

autoencoder = Model(encoder_inputs, decoded_outputs)

From this autoencoder model, I have created encoder and decoder model. Encoder model will compress the data and decoder model will be used while reconstructing original data. Then trained the auotoencoder model.

decoder_input = Input(shape = (32,))
next_layer = decoder_input
for layer in autoencoder.layers[-6:]:  # to get input layer for decoder
    next_layer = layer(next_layer)

decoder = Model(decoder_input, next_layer)

encoder = Model(encoder_inputs, enocoder_outputs)

m = 256 # batch size
n_epoch = 100
autoencoder.compile(optimizer='adam', loss='mse')
autoencoder.fit(output_X_train,output_X_train, epochs=n_epoch, batch_size=m, shuffle=True)

decoder_input = Input(shape = (32,))

next_layer = decoder_input

for layer in autoencoder.layers[-6:]: # to get input layer for decoder

next_layer = layer(next_layer)

decoder = Model(decoder_input, next_layer)

encoder = Model(encoder_inputs, enocoder_outputs)

m = 256 # batch size

n_epoch = 100

autoencoder.compile(optimizer='adam', loss='mse')

autoencoder.fit(output_X_train,output_X_train, epochs=n_epoch, batch_size=m, shuffle=True)

Using encoder model we can save compressed data into a text file. Which having size of 18 MB( Much less then original size 45 MB).

encoded = encoder.predict(output_X_train)
with open('compressed_data.txt', 'w') as data_file:
    for data in encoded:
        for each_data in data:
            data_file.write(str(each_data))
            data_file.write('\n')

encoded = encoder.predict(output_X_train)

with open('compressed_data.txt', 'w') as data_file:

for data in encoded:

for each_data in data:

data_file.write(str(each_data))

data_file.write('\n')

Now next thing is how we can reconstruct this compressed data when original data is needed. The simple solution is, we can save our decoder model and its weight which will be used further to reconstruct this compressed data. Let’s save decoder model and it’s weights.

decoder.save_weights('decoder.h5')
decoder_json = decoder.to_json()
with open('decoder.json', 'w') as json_file:
    json_file.write(decoder_json)

decoder.save_weights('decoder.h5')

decoder_json = decoder.to_json()

with open('decoder.json', 'w') as json_file:

json_file.write(decoder_json)

Finally we are having our compressed data and decoder model. Let’s see code how we can simply reconstruct back using these two.

# reading compressed data
with open('compressed_data.txt') as data_file:
    data = data_file.readlines()

compressed_data = [float(x.strip()) for x in data]
compressed_data= [compressed_data[i:i+32] for i in range(0, len(compressed_data), 32)] 

# load decoder model and its weights
json_file = open('decoder.json', 'r')
loaded_json_model = json_file.read()
decoder = model_from_json(loaded_json_model)
decoder.load_weights('decoder.h5')

decoded_imgs  = decoder.predict(np.array(compressed_data))

# reading compressed data

with open('compressed_data.txt') as data_file:

data = data_file.readlines()

compressed_data = [float(x.strip()) for x in data]

compressed_data= [compressed_data[i:i+32] for i in range(0, len(compressed_data), 32)]

# load decoder model and its weights

json_file = open('decoder.json', 'r')

loaded_json_model = json_file.read()

decoder = model_from_json(loaded_json_model)

decoder.load_weights('decoder.h5')

decoded_imgs = decoder.predict(np.array(compressed_data))

Above are our output from decoder model.

It looks fascinating to compress data to less size and get same data back when we need, but there are some real problem with this method.

The problem is autoencoders can not generalize. Autoencoders can only reconstruct images for which these are trained. But with the advancement in deep learning those days are not far away when you will use this type compression using deep learning.

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

TheAILearner

Mastering Artificial Intelligence

Tag Archives: compression of data

Compression of data using Autoencoders