In the previous blog, we discussed how global thresholding can be a tedious task when dealing with images having non-uniform illumination. This is because you need to ensure that while subdividing an image, each sub-image histogram is bimodal. Otherwise, the segmentation task will fail.
In this blog, we will discuss adaptive thresholding that works well for varying conditions like non-uniform illumination, etc. In this, the threshold value is calculated separately for each pixel using some statistics obtained from its neighborhood. This way we will get different thresholds for different image regions and thus tackles the problem of varying illumination.
The whole procedure can be summed up as:
For each pixel in the image
Calculate the statistics (such as mean, median, etc.) from its neighborhood. This will be the threshold value for that pixel.
Compare the pixel value with this threshold
Now, let’s discuss the OpenCV function for adaptive thresholding.
thresholdType: This tells us what value to assign to pixels greater/less than the threshold. Must be either THRESH_BINARY or THRESH_BINARY_INV. (You can read more about it here).
maxValue: This is the value assigned to the pixels after thresholding. This depends on the thresholding type. If the type is cv2.THRESH_BINARY, all the pixels greater than the threshold are assigned this maxValue.
adaptiveMethod: This tells us how the threshold is calculated from the pixel neighborhood. This currently supports two methods:
cv2.ADAPTIVE_THRESH_MEAN_C: In this, the threshold value is the mean of the neighborhood area.
cv2.ADAPTIVE_THRESH_GAUSSIAN_C: In this, the threshold value is the weighted sum of the neighborhood area. This uses Gaussian weights computed using getGaussiankernel() method. You can read more about it here.
blockSize: This is the neighborhood size.
C: a constant which is subtracted from the threshold.
As discussed OpenCV only provides mean and weighted mean to serve as the threshold. But don’t limit yourself to these two statistics. Try other statistics like standard deviation, median, etc. by writing your own helper function. Let’s see how to use this.
In the previous blogs, we discussed different methods for automatically finding the global threshold for an image. For instance, the iterative method, Otsu’s method, etc. In this blog, we will discuss another very simple approach for automatic thresholding – Balanced histogram thresholding. As clear from the name, this method tries to automatically find the threshold by balancing the image histogram. Let’s understand this method in detail.
Note:This method assumes that the image histogram is bimodal and a reasonable contrast ratio exists between the background and the region of interest.
Concept
Suppose you have a perfectly balanced histogram i.e. a histogram where the distribution of the background and the roi is the same. If you place such a histogram over the lever, it will be balanced. And the optimum threshold will be at the center of the lever as shown in the figure below
This is the main idea behind the Balanced Histogram Thresholding. This method tries to balance the image histogram and then infer the threshold value from that.
But in real-life situations, we don’t encounter images with such perfectly balanced histograms. So, let’s see how this method balances the unbalanced histograms.
First, it places the histogram over the lever and calculates the center point.
Then this calculates the left side and right side weights from the center point.
Removes weight from the heavier side and adjust the center.
Repeat the above two steps until the starting and the endpoints are equal to the center.
The whole procedure can be summed up in the below gif (taken from Wikipedia)
In the previous blog, we discussed global thresholding and how to find the global threshold using the iterative approach. In this blog, we will discuss Otsu’s method, named after Nobuyuki Otsu, that automatically finds the global threshold. So, let’s discuss this method in detail.
Note: This method assumes that the image histogram is bimodal and a reasonable contrast ratio exists between the background and the region of interest.
In simple terms, Otsu’s method tries to find a threshold value which minimizes the weighted within-class variance. Since Variance is the spread of the distribution about the mean. Thus, minimizing the within-class variance will tend to make the classes compact.
Let’s say we threshold a histogram at a value “t”. This produces two regions – left and right of “t” whose variance is given by σ20 and σ21. Then the weighted within-class variance is given by
where w0(t) and w1(t) are the weights given to each class. Weights are total pixels in a thresholded region (left or right) divided by the total image pixels. Let’s take a simple example to understand how to calculate these.
Suppose we have the following histogram and we want to find the weighted within-class variance corresponding to threshold value 1.
Below are the weights and the variances calculated for left and the right regions obtained after thresholding at value 1.
Similarly, we will iterate over all the possible threshold values, calculate the weighted within-class variance for each of the thresholds. The optimum threshold will be the one with the minimum within-class variance.
The gif below shows how the within-class variance (blue dots) varies with the threshold value for the above histogram. The optimum threshold value is the one where the within-class variance is minimum.
OpenCV also provides a builtin function to calculate the threshold using this method.
OpenCV
You just need to pass an extra flag, cv2.THRESH_OTSU in the cv2.threshold() function which we discussed in the previous blog. The optimum threshold value will be returned by this along with the thresholded image. Let’s see how to use this.
We all know that minimizing within-class variance is equivalent to maximizing between-class variance. This maximization operation can be implemented recursively and is faster than the earlier method. The expression for between-class variance is given by
Below are the steps to calculate recursively between-class variance.
Calculate the histogram of the image.
Set up weights and means corresponding to the “0” threshold value.
Loop through all the threshold values
Update the weights and the mean
Calculate the between-class variance
The optimum threshold will be the one with the max variance.
Below is the code in Python that implements the above steps.
This is how you can implement otsu’s method recursively if you consider maximizing between-class variance. Now, let’s discuss what are the limitations of this method.
Limitations
Otsu’s method is only guaranteed to work when
The histogram should be bimodal.
Reasonable contrast ratio exists between the background and the roi.
Uniform lighting conditions are there.
Image is not affected by noise.
Size of the background and the roi should be comparable.
There are many modifications done to the original Otsu’s algorithm to address these limitations such as two-dimensional Otsu’s method etc. We will discuss some of these modifications in the following blogs.
In the following blogs, we will also discuss how to counter these limitations so as to get satisfactory results with otsu’s method. Hope you enjoy reading.
If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.
In the previous blog, we discussed otsu’s method for automatic image thresholding. Then we also discussed the limitations of the otsu’s method. In this blog, we will discuss how to handle these limitations so as to produce satisfactory thresholding results. So, let’s get started.
Case-1: When the noise is present in the image
If the noise is present in the image, then this tends to change the modality of the histogram. The sharp valleys between the peaks of the bimodal histogram start degrading. In that case, the otsu’s method or any other global thresholding method will fail. So, in order to find the global threshold, one should first remove the noise using any smoothing filters like Gaussian, etc. and then apply any automatic thresholding method like otsu, etc.
Case-2: When the object area is small compared to the background area
In this case, the image histogram will be dominated by a large background area. This will increase the probability of any pixel belonging to the background. So, the histogram will no longer exhibit bimodality and thus otsu will result in segmentation error. To prevent this, one should only consider pixels that lie on or near the edges between the objects and the background. Doing so will result in an image histogram with peaks of approximately the same size. Then we can apply any automatic thresholding method like otsu, etc. Below are the steps to implement the above procedure.
Calculate the edge image using any high pass filter like Sobel, Laplacian, etc.
Select any threshold value (T).
Threshold the above edge image to produce a binary mask.
Apply the mask image on the input image using any bitwise operations or any other method.
This results in only those pixels where the mask image was white.
Compute the histogram of only those pixels
Finally, apply any automatic global thresholding method like otsu, etc.
Case-3: When the image is taken under non-uniform illumination conditions
In this case, the histogram no longer remains bimodal and thus we will not be able to segment the image satisfactorily. One of the simplest approaches is to subdivide the image into non-overlapping images/rectangles. The size of these rectangles is chosen such that the illumination is nearly constant in each of these rectangles. Then we will apply any global thresholding technique like otsu for each of these rectangles.
The above procedure only works when the size of the object and the background are comparable in the rectangle. This is quite intuitive as only then we will have a bimodal histogram. Taking care of the background and the object sizes in each rectangle is a tedious task.
So, in the next blog, we will discuss adaptive thresholding that works pretty well for the above conditions. That’s all for this blog. Hope you enjoy reading.
If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.
In the previous blog, we discussed image thresholding and when to use this for image segmentation. We also learned that thresholding can be global or adaptive depending upon how the threshold value is selected.
In this blog, we will discuss
global thresholding
OpenCV function for global thresholding
How to choose threshold value using the iterative algorithm
In global thresholding, each pixel value in the image is compared with a single (global) threshold value. Below is the code for this.
Here, we assign a value of “val_high” to all the pixels greater than the threshold otherwise “val_low“. OpenCV also provides a builtin function for thresholding the image. So, let’s take a look at that function.
OpenCV
1
cv2.threshold(src,thresh,maxval,type)→retval,dst
This function returns the thresholded image(dst) and the threshold value(retval). Its arguments are
src: input greyscale image (8-bit or 32-bit floating point)
thresh: global threshold value
type: Different types that decide “val_high” and “val_low“. In other words, these types decide what value to assign for pixels greater than and less than the threshold. Below figure shows different thresholding types available.
maxval: maximum value to be used with THRESH_BINARY and THRESH_BINARY_INV. Check the below image.
To specify the thresholding type, write “cv2.” as the prefix. For instance, write cv2.THRESH_BINARY if you want to use this type. Let’s take an example
Similarly, you can apply other thresholding types to check how they work. Till now we discussed how to threshold an image using a global threshold value. But we didn’t discuss how to get this threshold value. So, in the next section, let’s discuss this.
How to choose the threshold value?
As already discussed, that global thresholding is a suitable approach only when intensity distributions of the background and the ROI are sufficiently distinct. In other words, there is a clear valley between the peaks of the histogram. We can easily select the threshold value in that situation. But what if we have a number of images. In that case, we don’t manually want to first check the image histogram and then deciding the threshold value. We want something that can automatically estimate the threshold value for each image. Below is the algorithm that can be used for this purpose.
Image Segmentation is the process of subdividing an image into its constituent regions or objects. In many computer vision applications, image segmentation is very useful to detect the region of interest. For instance, in medical imaging where we have to locate tumors, or in object detection like self-driving cars have to detect pedestrians, traffic signals, etc or for video surveillance, etc. There are a number of methods available to perform image segmentation. For instance, thresholding, clustering methods, graph partitioning methods, and convolutional methods to mention a few.
In this blog, we will discuss Image Thresholding which is one of the simplest methods for image segmentation. In this, we partition the images directly into regions based on the intensity values. So, let’s discuss image thresholding in greater detail.
Concept
If the pixel value is greater than a threshold value, it is assigned one value (maybe white), else it is assigned another value (maybe black).
In other words, if f(x,y) is the input image then the segmented image g(x,y) is given by
If the threshold value T remains constant over the entire image, then this is known as global thresholding. When the value of T changes over the entire image or depends upon the pixel neighborhood, then this is known as adaptive thresholding. We will cover both these types in greater detail in the following blogs.
Applicability Condition
Thresholding is only guaranteed to work when a good contrast ratio between the region of interest and the background exists. Otherwise, the thresholding will not be able to fully detect the region of interest. Let’s understand this by an example.
Suppose we have two images from which we want to segment the square region (our region of interest) from the background.
Let’s plot the histogram of these two images.
Clearly as expected for “A“, the histogram is showing two peaks corresponding to the square and the background. The separation between the peaks shows that the background and ROI have a good contrast ratio. By choosing a threshold value between the peaks, we will be able to segment out the ROI. While for “B”, the intensity distribution of the ROI and the background is not that distinct. Thus we may not be able to fully segment the ROI.
Thresholded images are shown below (How to choose a threshold value will be discussed in the next blog).
So, always plot the image histogram to check the contrast ratio between the background and the ROI. Only if the contrast ratio is good, choose the thresholding method for image segmentation. Otherwise, look for other methods.
In the next blog, we will discuss global thresholding and how to choose the threshold value using the iterative method. Hope you enjoy reading.
If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.
In the previous blog, we discussed the binary classification problem where each image can contain only one class out of two classes. So, in this blog, we will extend this to the multi-class classification problem. In multi-class problem, we classify each image into one of three or more classes. So, let’s get started.
Here, we will use the CIFAR-10 dataset, developed by the Canadian Institute for Advanced Research (CIFAR). The CIFAR-10 dataset consists of 60000 (32×32) color images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. The classes are completely mutually exclusive. Below are the classes in the dataset, as well as 10 random images from each class.
CIFAR-10 dataset can be downloaded by using any of the two methods:
Using Keras builtin datasets
From the official website
Method-1
Downloading using the Keras builtin datasets is pretty straightforward and simple. It’s already transformed into the shape appropriate for the CNN input. No headache, just write one line of code and you are done.
The data can also be downloaded from the official website. But the only thing is that it is not in the standard format that can be inputted directly to the model. Let’s see how the dataset is arranged.
The dataset is broken into 5 files so as to prevent your machine from running out of memory. Each file contains a dictionary of data and the corresponding labels. Data is a 10000×3072 array where 10000 is the number of images and 3072 are the pixel values in row-major order. So, the first 1024 entries contain the red channel values, the next 1024 the green, and the final 1024 the blue. You need to convert it into a (32,32) color image.
Steps:
First, unpickle all the train and test files
Then convert the image format to (width x height x num_channel)
# convert the image format to (width x height x num_channel)
b=np.reshape(train_x,(50000,3,32,32))
train_x=np.transpose(b,(0,2,3,1))
# Normalize the data between 0 and 1
train_x=train_x.astype('float32')/255
train_y=np.expand_dims(train_y,axis=-1)
Split the data into train and validation
Because the training data contains images in the random order thus simple splitting will be sufficient. Another way is to take some % of images from each of the 5 train files to constitute a validation set.
1
2
3
4
x_train=train_x[:45000]
y_train=train_y[:45000]
val_x=train_x[45000:]
val_y=train_y[45000:]
To make sure that this splitting leads to the uniform proportion of examples for each class, we can plot the counts of each class in the validation dataset. Below is the bar plot. Looks like all the classes are uniformly distributed in the validation set.
Model Architecture
Since the images contain a diverse amount of information, we will be needing a bigger network. Bigger the network more will be the chances of overfitting, So, to prevent this we may need to apply some regularization techniques.
In the previous blogs, we discussed binary and multi-class classification problems. Both of these are almost similar. The basic assumption underlying these two problems is that each image can contain only one class. For instance, for the dogs vs cats classification, it was assumed that the image can contain either cat or dog but not both. So, in this blog, we will discuss the case where more than one classes can be present in a single image. This type of classification is known as Multi-label classification. Below picture explains this concept beautifully.
Some of the most common techniques for solving multi-label classification problems are
Problem Transformation
Adapted Algorithm
Ensemble approaches
Here, we will only discuss only Binary Relevance, a method that falls under the Problem Transformation category. If you are curious about other methods, you can read this amazing review paper.
In binary relevance, we try to break the problem into a number of binary classification problems. So, now for each class available, we will ask if it is present in the image or not. As we already know that the binary classification uses ‘sigmoid‘ as the last layer activation function and ‘binary_crossentropy‘ as the loss function. So, here we will also use the same. Rest all things are the same.
Now, let’s take a dataset and see how to implement multi-label classification.
Problem Definition
Here, we will take the most common Movie Genre classificationbased on the poster images problem. Because a movie can belong to more than one genre, for instance, comedy, romance, etc. and hence is a multi-label classification problem.
Dataset
You can download the original dataset from here. This contains two files.
Movie_Poster_Dataset.zip – The poster images
Movie_Poster_Metadata.zip – Metadata of each poster image like ID, genres, box office, etc.
To prepare the dataset, we need images and corresponding genre information. For this, we need to extract the genre information from the Movie_Poster_Metadata.zip file corresponding to each poster image. Let’s see how to do this.
Note: This dataset contains some missing items. For instance, check the “1982” folder in the Movie_Poster_Dataset.zipand Movie_Poster_Metadata.zip. The number of poster images and the corresponding genre information is missing for some movies. So, we need to perform EDA and remove these files.
Steps to perform EDA:
First, we will extract the movie name and corresponding genre information from the Movie_Poster_Metadata.zip file and create a Pandas dataframe using these.
Then we will loop over the poster images in the Movie_Poster_Dataset.zip file and check if it is present in the dataframe created above. If the poster is not present, we will remove that movie from the dataframe.
These two steps will ensure that we are only left with movies that have poster images and genre information. Below is the code for this.
Because the encoding of some files is different, that’s why 2 for loops. Below are the steps performed in the code.
First, open the metadata file
Read line by line
Extract the information corresponding to the ‘Genre’ and ‘imdbID’
Append them into the list and create a dataframe
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
b=[]
b2=[]
foriinrange(1980,1982):
with open('D:/downloads/Movie_Poster_Dataset/groundtruth/{}.txt'.format(i),mode="r")asf:
forlines inf.readlines():
lines=lines.rstrip('\n')
if'imdbID'inlines:
a2,b3,c2=lines.partition(':')
c2=c2.lstrip(' "')
c2=c2.rstrip('",\n')
b2.append(c2+'.jpg')
if"Genre"inlines:
# print(lines)
a,b1,c=lines.partition(':')
c=c.lstrip(' "')
c=c.rstrip('",\n')
c1=c.split(',')
c1=map(str.strip,c1)
b.append(list(c1))
f.close()
foriinrange(1982,2016):
with open('D:/downloads/Movie_Poster_Dataset/groundtruth/{}.txt'.format(i),mode="r",encoding='utf-16-le')asf:
forlines inf.readlines():
lines=lines.rstrip('\n')
if'imdbID'inlines:
a2,b3,c2=lines.partition(':')
c2=c2.lstrip(' "')
c2=c2.rstrip('",\n')
b2.append(c2+'.jpg')
if"Genre"inlines:
a,b1,c=lines.partition(':')
c=c.lstrip(' "')
c=c.rstrip('",\n')
c1=c.split(',')
c1=map(str.strip,c1)
b.append(list(c1))
f.close()
data=pd.DataFrame({'name':b2,'filename':b})
Now for the second step, we first append all the poster images filenames in the list.
So, finally, we are ready with our cleaned dataset with 8052 images containing overall 25 classes. The dataframe is shown below.
Format 1
One can also convert this dataframe into the common format as shown below
Format 2
This can be done using the following code.
1
2
3
4
5
foridx,row indf.iterrows():
forhobby inrow.filename:
df.loc[idx,hobby]='1'
df.fillna('0',inplace=True)
In this post, we will be using Format 1. You can use any. Here, we will be using the Keras flow_from_dataframe method. For this, we need to place all the images under one directory. Currently, all the images are in separate folders such as 1980, 1981, etc. Below is the code that places all the poster images in a single folder ‘original_train‘.
Here, I’ve used both to show how accuracy instantly reaches 90+ from the starting epoch and thus is not a correct metric.
flow_from_dataframe()
Here, I split the data into training and validation sets using the validation_split argument of ImageDataGenerator. You can read more about the ImageDataGenerator here.
See how accuracy is reaching 90+ within few epochs. As stated earlier this is not a good evaluation metric for multi-label classification. On the other hand, top_k_categorical_accuracy is showing us the true picture.
Clearly, we are doing a pretty decent job. Considering the fact that training data is small and the complexity of the problem is large(25 classes). Moreover, some classes like comedy, etc dominate the training data. Play with the model architecture and other hyperparameters and check how the accuracy varies.
Prediction time
For each image, let’s predict the top three predicted classes. Below is the code for this.
You can see that our model is doing a decent job considering the complexity of the problem
Let’s try another example “tt0465602.jpg“. For this the predicted labels are
By looking at the poster most of us will predict the labels as predicted by our algorithm. Actually, these are pretty close to the true labels that are [Action, Comedy, Crime].
That’s all for multi-label classification problem. Hope you enjoy reading.
If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.
In neural networks, the best idea for debugging is to see the relationship between the cost and the number of iterations. This not only ensures that the optimizer is working properly but can also be very useful in the indication of overfitting. Moreover, we can also debug the learning rate based on this relationship. Thus, one should always keep a track on the loss and the accuracy metrics while training a neural network.
Fortunately, in Keras, we don’t need to write a single extra line of code to store all these values. Keras automatically keeps the record of all the events for each epoch. This includes loss and accuracy metrics for both training and validation sets (if used). This is done using the History callback which is automatically applied to every Keras model. This callback records all the events into a History object that gets returned by the fit() method.
How does this work?
First, at the onset of training, this creates an empty dictionary to store all the events. Then at every epoch end, all the events are appended into the dictionary. Below is the code for this taken from the Keras GitHub.
1
2
3
4
5
6
7
8
9
def on_train_begin(self,logs=None):
self.epoch=[]
self.history={}
def on_epoch_end(self,epoch,logs=None):
logs=logsor{}
self.epoch.append(epoch)
fork,vinlogs.items():
self.history.setdefault(k,[]).append(v)
How to use this?
Since all the saved records are returned by the fit() method, we can simply store all the events in any variable. Here, I’ve used “record” as the variable name.
Now, using this record object, we can retrieve any information about the training process. For instance, “record.epoch” returns the list of epochs.
1
2
record.epoch
>>>[0,1,2,3,4]
“record.history” returns the dictionary containing the event names as the dictionary keys and their values at each epoch in a list.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
record.history
>>>{'val_loss':[0.037735904552973806,
0.0383092053757185,
0.03597573842055863,
0.04122187125845812,
0.04817074624549908],
'val_acc':[0.9889166665077209,
0.9899166665077209,
0.9912499998410543,
0.989416666507721,
0.988666666507721],
'loss':[0.01413803466820779,
0.010153079537209123,
0.008668457060974712,
0.008247203289516619,
0.007816496806034896],
'acc':[0.9952291666666667,
0.9967083333333333,
0.9972291666666667,
0.9971875,
0.9975208333333333]}
You can retrieve all the event names using the following command.
1
2
record.history.keys()
>>>dict_keys(['val_loss','val_acc','loss','acc'])
You can also get the information about the parameters used while fitting the model. This can be done using the following command.
1
2
3
4
5
6
7
8
record.params
>>>{'batch_size':128,
'epochs':5,
'steps':None,
'samples':48000,
'verbose':1,
'do_validation':True,
'metrics':['loss','acc','val_loss','val_acc']}
Not only this, but one can also check which data is used as the validation data using the following command.
1
record.validation_data[0]
These are just a few of functionalities available under the History callback. You can check more of these at Keras GitHub.
Plot the training history
Since all the events are stored in a dictionary, one can easily plot these using any plotting library. Here, I’m using Matplotlib. Below is the code for plotting the loss curves for both training and validation sets.
One common problem that we face while training a neural network is of overfitting. This refers to a situation where the model fails to generalize. In other words, the model performs poorly on the test/validation set as compared to the training set. Take a look at the plot below.
Clearly, after ‘t’ epochs, the model starts overfitting. This is clear by the increasing gap between the train and the validation error in the above plot. Wouldn’t it be nice if we stop the training where the gap starts increasing? This will help prevent the model from overfitting. This method is known as Early Stopping. Some of the pros of using this method are
Prevents the model from overfitting
Parameter-free unlike other regularization techniques like L2 etc.
Removes the need to manually set the number of epochs. Because now the model will automatically stop training when the monitored quantity stops improving.
Fortunately, in Keras, this is done using the EarlyStopping callback. So, let’s first discuss its Keras API and then we will learn how to use this.
In this, you first need to provide which quantity to monitor using the “monitor” argument. This can take a value from ‘loss’, ‘acc’, ‘val_loss’, ‘val_acc’ or ‘val_metric’ where metric is the name of the metric used. For instance, if the metric is set to ‘mse’ then pass ‘val_mse’.
After setting the monitored quantity, you need to decide whether you want to minimize or maximize it. For instance, we want to minimize loss and maximize accuracy. This can be done using the “mode” argument. This can take value from [‘min‘, ‘max‘, ‘auto‘]. Default is the ‘auto’ mode. In ‘auto’ mode, this automatically infers whether to maximize or minimize depending upon the monitored quantity name.
This stops training whenever the monitored quantity stops improving. By default, any fractional change is considered as an improvement. For instance, if ‘val_acc’ increases from 90% to 90.0001% this is also considered as an improvement. The meaning of improvement may vary from one application to another. So, here we have an argument “min_delta“. Using this we can set the minimum change in the monitored quantity to qualify as an improvement. For instance, if min_delta=1, so all the absolute changes of less than 1, will count as no improvement.
Note: This difference is calculated as the current monitored quantity value minus the best-monitored quantity value until now.
As we already know that neural networks mostly face the problem of plateaus. So monitored quantity may not show improvement for some time and then improve afterward. So, it’s better to wait for a few epochs before making the final decision to stop the training process. This can be done using the “patience” argument. For instance, a patience=3 means if the monitored quantity doesn’t improve for 3 epochs, stop the training process.
The model will stop training some epochs (specified by the “patience” argument) after the best-monitored quantity value. So, the weights you will get are not the best weights. To retrieve the best weights, set the “restore_best_weights” argument to True.
Sometimes for a task, we have a baseline in our mind that at least I should get a minimum of 75% accuracy within 5 epochs. If you are not getting this, there is no point training the model any further. Then you should try changing the hyperparameters and again retrain the model. In this, you can set the baseline using the “baseline” argument. If the monitored quantity minus the min_delta is not surpassing the baseline within the epochs specified by the patience argument, then the training process is stopped.
For instance, below is an example where the baseline is set to 98%.