In the previous blogs, we discussed binary and multi-class classification problems. Both of these are almost similar. The basic assumption underlying these two problems is that each image can contain only one class. For instance, for the dogs vs cats classification, it was assumed that the image can contain either cat or dog but not both. So, in this blog, we will discuss the case where more than one classes can be present in a single image. This type of classification is known as Multi-label classification. Below picture explains this concept beautifully.
Some of the most common techniques for solving multi-label classification problems are
- Problem Transformation
- Adapted Algorithm
- Ensemble approaches
Here, we will only discuss only Binary Relevance, a method that falls under the Problem Transformation category. If you are curious about other methods, you can read this amazing review paper.
In binary relevance, we try to break the problem into a number of binary classification problems. So, now for each class available, we will ask if it is present in the image or not. As we already know that the binary classification uses ‘sigmoid‘ as the last layer activation function and ‘binary_crossentropy‘ as the loss function. So, here we will also use the same. Rest all things are the same.
Now, let’s take a dataset and see how to implement multi-label classification.
Problem Definition
Here, we will take the most common Movie Genre classification based on the poster images problem. Because a movie can belong to more than one genre, for instance, comedy, romance, etc. and hence is a multi-label classification problem.
Dataset
You can download the original dataset from here. This contains two files.
- Movie_Poster_Dataset.zip – The poster images
- Movie_Poster_Metadata.zip – Metadata of each poster image like ID, genres, box office, etc.
To prepare the dataset, we need images and corresponding genre information. For this, we need to extract the genre information from the Movie_Poster_Metadata.zip file corresponding to each poster image. Let’s see how to do this.
Note: This dataset contains some missing items. For instance, check the “1982” folder in the Movie_Poster_Dataset.zip and Movie_Poster_Metadata.zip. The number of poster images and the corresponding genre information is missing for some movies. So, we need to perform EDA and remove these files.
Steps to perform EDA:
- First, we will extract the movie name and corresponding genre information from the Movie_Poster_Metadata.zip file and create a Pandas dataframe using these.
- Then we will loop over the poster images in the Movie_Poster_Dataset.zip file and check if it is present in the dataframe created above. If the poster is not present, we will remove that movie from the dataframe.
These two steps will ensure that we are only left with movies that have poster images and genre information. Below is the code for this.
Because the encoding of some files is different, that’s why 2 for loops. Below are the steps performed in the code.
- First, open the metadata file
- Read line by line
- Extract the information corresponding to the ‘Genre’ and ‘imdbID’
- Append them into the list and create a dataframe
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
b = [] b2 = [] for i in range(1980,1982): with open('D:/downloads/Movie_Poster_Dataset/groundtruth/{}.txt'.format(i), mode="r") as f: for lines in f.readlines(): lines = lines.rstrip('\n') if 'imdbID' in lines: a2,b3,c2 = lines.partition(':') c2 = c2.lstrip(' "') c2 = c2.rstrip('",\n') b2.append(c2+'.jpg') if "Genre" in lines: # print(lines) a,b1,c = lines.partition(':') c = c.lstrip(' "') c = c.rstrip('",\n') c1 = c.split(',') c1 = map(str.strip, c1) b.append(list(c1)) f.close() for i in range(1982,2016): with open('D:/downloads/Movie_Poster_Dataset/groundtruth/{}.txt'.format(i), mode="r",encoding='utf-16-le') as f: for lines in f.readlines(): lines = lines.rstrip('\n') if 'imdbID' in lines: a2,b3,c2 = lines.partition(':') c2 = c2.lstrip(' "') c2 = c2.rstrip('",\n') b2.append(c2+'.jpg') if "Genre" in lines: a,b1,c = lines.partition(':') c = c.lstrip(' "') c = c.rstrip('",\n') c1 = c.split(',') c1 = map(str.strip, c1) b.append(list(c1)) f.close() data = pd.DataFrame({'name':b2,'filename':b}) |
Now for the second step, we first append all the poster images filenames in the list.
1 2 3 4 |
q=[] for i in range(1980,2016): for files in os.listdir('D:/downloads/Movie_Poster_Dataset/Movie_Poster_Dataset/{}'.format(i)): q.append(files) |
Then check if the name is present in the dataframe or not. If not, we will remove the rows from the dataframe or create a new dataframe.
1 2 |
new = list(set(data['name']).intersection(q)) data2 = data[data['name'].isin(new)] |
Be sure that we have no duplicates in the dataframe.
1 |
data2.drop_duplicates(subset= 'name',keep='first', inplace=True) |
So, finally, we are ready with our cleaned dataset with 8052 images containing overall 25 classes. The dataframe is shown below.
One can also convert this dataframe into the common format as shown below
This can be done using the following code.
1 2 3 4 5 |
for idx, row in df.iterrows(): for hobby in row.filename: df.loc[idx, hobby] = '1' df.fillna('0', inplace=True) |
In this post, we will be using Format 1. You can use any. Here, we will be using the Keras flow_from_dataframe method. For this, we need to place all the images under one directory. Currently, all the images are in separate folders such as 1980, 1981, etc. Below is the code that places all the poster images in a single folder ‘original_train‘.
1 2 3 4 5 6 |
original_train = 'D:/downloads/Movie_Poster_Dataset/Data/' for i in range(1980,2016): for files in os.listdir('D:/downloads/Movie_Poster_Dataset/Movie_Poster_Dataset/{}'.format(i)): src = os.path.join('D:/downloads/Movie_Poster_Dataset/Movie_Poster_Dataset/{}'.format(i),files) out = os.path.join(original_train,files) shutil.copy(src,out) |
Model Architecture
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
model = Sequential() model.add(Conv2D(16,(3,3),activation='relu',input_shape=(400,300,3))) model.add(Dropout(0.25)) model.add(Conv2D(16,(3,3),activation='relu')) model.add(MaxPool2D((2,2))) model.add(Dropout(0.5)) model.add(Conv2D(32,(3,3),activation='relu')) model.add(Conv2D(32,(3,3),activation='relu')) model.add(Dropout(0.5)) model.add(MaxPool2D((2,2))) model.add(Conv2D(64,(3,3),activation='relu')) model.add(MaxPool2D((2,2))) model.add(Dropout(0.5)) model.add(Conv2D(128,(3,3),activation='relu')) model.add(Conv2D(64,(1,1),activation='relu')) model.add(MaxPool2D(pool_size=(4,4))) model.add(Dropout(0.25)) model.add(Flatten()) model.add(Dense(512,activation='relu')) model.add(Dropout(0.5)) model.add(Dense(25, activation='sigmoid')) |
Since this is a sparse multilabel classification problem, accuracy is not a good metric for this. The reason for this is shown below.
if the predicted output was [0, 0, 0, 0, 0, 1]
and the correct output was [0, 0, 0, 0, 0, 0]
, my accuracy would still be 5/6
.
So, you can use other metrics like precision, recall, f1 score, hamming loss, top_k_categorical_accuracy, etc.
1 |
model.compile(optimizer='Adam',loss="binary_crossentropy",metrics=["accuracy",'top_k_categorical_accuracy']) |
Here, I’ve used both to show how accuracy instantly reaches 90+ from the starting epoch and thus is not a correct metric.
flow_from_dataframe()
Here, I split the data into training and validation sets using the validation_split argument of ImageDataGenerator. You can read more about the ImageDataGenerator here.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
datagen = ImageDataGenerator(rescale=1/255., validation_split=0.1) train_generator = datagen.flow_from_dataframe(dataframe=data2, directory=original_train, x_col='name', y_col='filename', target_size=(400,300), color_mode='rgb', class_mode='categorical', batch_size=30, shuffle=True, subset='training', seed=7) validation_generator = datagen.flow_from_dataframe(dataframe=data2, directory=original_train, x_col='name', y_col='filename', target_size=(400,300), color_mode='rgb', class_mode='categorical', batch_size=35, shuffle=False, subset='validation', seed=7) |
Below are some of the poster images all resized into (400,300,3).
1 2 3 4 5 6 7 8 9 10 11 |
plt.figure(figsize=(10,5)) for i in range(6): plt.subplot(2,3,i+1) for x,y in train_generator: print(x.shape) plt.imshow(x[0]) plt.xticks([]) plt.yticks([]) break plt.tight_layout() plt.show() |
You can also check which labels are assigned to which class using the following code.
1 |
train_generator.class_indices |
This prints a dictionary containing class names as keys and labels as values.
Let’s start training…
1 2 3 4 5 |
train_steps = train_generator.n//train_generator.batch_size validation_steps = validation_generator.n//validation_generator.batch_size history = model.fit_generator(train_generator,steps_per_epoch=train_steps, epochs=10, validation_data=validation_generator,validation_steps=validation_steps) |
See how accuracy is reaching 90+ within few epochs. As stated earlier this is not a good evaluation metric for multi-label classification. On the other hand, top_k_categorical_accuracy is showing us the true picture.
Clearly, we are doing a pretty decent job. Considering the fact that training data is small and the complexity of the problem is large(25 classes). Moreover, some classes like comedy, etc dominate the training data. Play with the model architecture and other hyperparameters and check how the accuracy varies.
Prediction time
For each image, let’s predict the top three predicted classes. Below is the code for this.
1 2 3 4 5 6 7 8 9 10 |
img = load_img('D:/downloads/Movie_Poster_Dataset/Data/tt0080854.jpg') img1 = img_to_array(img) img2 = cv2.resize(img1,(300,400)) img2 = img2/255 img3 = np.expand_dims(img2,axis=0) proba = model.predict(img3) top_3 = np.argsort(proba[0])[:-4:-1] for i in range(3): print("{}".format(list(train_generator.class_indices.keys())[top_3[i]])+" ({:.3})".format(proba[0][top_3[i]])) plt.imshow(img) |
The actual label for this can be found out as
1 |
data2[data2['name']=='tt0080854.jpg'] |
You can see that our model is doing a decent job considering the complexity of the problem
Let’s try another example “tt0465602.jpg“. For this the predicted labels are
By looking at the poster most of us will predict the labels as predicted by our algorithm. Actually, these are pretty close to the true labels that are [Action, Comedy, Crime].
That’s all for multi-label classification problem. Hope you enjoy reading.
If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.