In this blog, we will discuss how to use the GrabCut algorithm for the foreground extraction. At that time (around 2004), the GrabCut algorithm outperformed most of the available foreground extraction methods both in terms of the resulting output quality and the simplicity of user input. Let’s first discuss the theory part and then implement it using OpenCV. So, let’s get started.
Theory
In this first, we need to draw a rectangle around the foreground region such that the region outside the rectangle is sure background while the region inside the rectangle is a combination of both foreground and background as shown below. Everything inside the rectangle is unknown, which we need to classify into foreground and background.
Now, we initialize everything outside the rectangle to be 0 (background) and inside to 1 (foreground). This is our initial labelling.
The authors have used a Gaussian Mixture Model (GMM) to model the foreground and background distributions. So, we will initialize 2 GMMs (one for background and the other for foreground) with k components (k=5 used) from the initial labeling we did earlier. Here, the GMM parameters correspond to the weights π, means µ, and covariances Σ of the 2K Gaussian components for the background and foreground distributions.
From these GMMs, create a new pixel distribution for the unknown region. From this pixel distribution, a graph is constructed with nodes representing the pixels as shown below. Each neighboring node is linked with edges whose weights are defined by the edge information or pixel similarity. This means similar pixels (in terms of color) will get higher edge weightage and vice-versa.
Further 2 additional nodes are added, the Source node and the Sink node. Every foreground pixel is connected to the Source node and every background pixel is connected to the Sink node with weights defined by the probability of pixel belonging to the foreground/background (obtained from GMMs).
Now, we use a MinCut algorithm to segment the graph. This divides the graph into two groups(separating the source node and sink node) that minimize the cost function. Label all the pixels (nodes) belonging to the source as foreground while those connected to sink as background. This is our new labeling. Now, repeat this process until convergence.
There is also an additional option for user editing. Let’s understand what this means. In some cases, the segmentation output will not be perfect, that is, some foreground regions may be marked as background and vice-versa. In that case, the user can specify the faulty regions (as belonging to the foreground and background) using a mask image. Re-run the algorithm. This approach is shown below.
OpenCV
OpenCV provides a built-in function cv2.grabCut() that implements the GrabCut algorithm. This provides both the modes, with a rectangle or with a mask as discussed above. The syntax is given below.
|
mask, bgdModel, fgdModel = cv2.grabCut(img, mask, rect, bgdModel, fgdModel, iterCount[, mode]) |
- img: Input 8-bit 3-channel image
- mask: 8-bit, single-channel image. In the case of rectangle mode, the input mask is initialized with 0’s. While for the mask mode, the input mask should contain the background and foreground regions labeled with 0’s and 1’s respectively.
- rect: ROI containing the foreground in the form of (x,y,w,h) as discussed above. Only used when mode= cv2.GC_INIT_WITH_RECT otherwise set to None.
- bgdModel, fgdModel: Temporary arrays used by the algorithm internally. Just create two 0’s arrays of size (1,65) and float64 dtype. Do not modify it while you are processing the same image.
- iterCount: Number of iterations.
- mode: Either cv2.GC_INIT_WITH_RECT or cv2.GC_INIT_WITH_MASK depending on whether we are drawing a rectangle or mask with strokes.
This outputs the modified mask image where each pixel belongs to either of the 4 classes sure background, sure foreground, probable background, and probable foreground. These 4 regions are specified by either values 0,1,2, and 3 or by flags cv2.GC_BGD, cv2.GC_FGD, cv2.GC_PR_BGD, cv2.GC_PR_FGD respectively. Now, let’s take the below image and implement this algorithm using OpenCV.
Let’s start with the rectangular mode. First load the image and create a 0’s mask and fgdModel and bgdModel as discussed above.
|
# Load the image img = cv2.imread('D:/downloads/messi.jpg') # Create a 0's mask mask = np.zeros(img.shape[:2],np.uint8) # Create 2 arrays for background and foreground model bgdModel = np.zeros((1,65),np.float64) fgdModel = np.zeros((1,65),np.float64) |
Next, draw the rectangle around the ROI. The coordinates can be obtained by opening the image using matplotlib or any other application such as Paint etc. I used matplotlib as shown below. Found the coordinates by hovering the mouse.
|
import matplotlib.pyplot as plt %matplotlib qt5 plt.imshow(img) plt.show() |
The rectangle is shown in red color in the below image.
Now, run the grabcut algorithm. This will output the modified mask.
|
rect = (350,66,300,624) mask, bgdModel, fgdModel = cv2.grabCut(img, mask, rect, bgdModel, fgdModel, 5, cv2.GC_INIT_WITH_RECT) |
As discussed, in the modified mask image, 0 and 2 corresponds to the background while 1 and 3 correspond to the foreground.
|
mask2 = np.where((mask==2)|(mask==0),0,1).astype('uint8') img_seg = img*mask2[:,:,np.newaxis] |
Below is the mask and the segmented output image.
Clearly, all the background region is segmented out but there are some foreground regions missing such as head, fingers, captain armband, etc. So, let’s see how to recover these regions using the user editing feature as discussed above.
Open this segmented image in any editing software such as Paint etc. Now, mark the missing regions with any color. Here, I have used white as shown below. Since there is no background part misclassified as foreground, so no need to label the background. If that’s not the case, you need to mark it with a different color. Below is the marked image.
To obtain the mask, just subtract the above 2 images.
|
# Load the marked image img_mark = cv2.imread('D:/downloads/messi_mask.jpg') # Subtract to obtain the mask mask_dif = cv2.subtract(img_mark, img_seg) # Convert the mask to grey and threshold it mask_grey = cv2.cvtColor(mask_dif, cv2.COLOR_BGR2GRAY) ret, mask1 = cv2.threshold(mask_grey, 200, 255,0) |
Below the final mask is shown. This is what you marked using editing software. You can actually directly create this using any software.
Now, in the modified mask, mark this as the sure foreground. Remember, sure foreground needs to be labeled as 1 in the mask.
So, your mask now contains user edited information. Run the grabcut algorithm with mask mode.
|
mask, bgdModel, fgdModel = cv2.grabCut(img, mask, None, bgdModel, fgdModel, 5, cv2.GC_INIT_WITH_MASK) |
Again change pixels with value 0 and 2 to background, 1 and 3 to foreground.
|
mask_final = np.where((mask==2)|(mask==0),0,1).astype('uint8') img_out = img*mask_final[:,:,np.newaxis] |
Below is the final segmented image.
See how we are able to segment the missing foreground regions now. The full code is given below
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
|
import numpy as np import cv2 # Load the image img = cv2.imread('D:/downloads/messi.jpg') # Create a 0's mask mask = np.zeros(img.shape[:2],np.uint8) # Create 2 arrays for background and foreground model bgdModel = np.zeros((1,65),np.float64) fgdModel = np.zeros((1,65),np.float64) rect = (350,66,300,624) mask, bgdModel, fgdModel = cv2.grabCut(img, mask, rect, bgdModel, fgdModel, 5, cv2.GC_INIT_WITH_RECT) mask2 = np.where((mask==2)|(mask==0),0,1).astype('uint8') img_seg = img*mask2[:,:,np.newaxis] # Load the marked image img_mark = cv2.imread('D:/downloads/messi_mask.jpg') # Subtract to obtain the mask mask_dif = cv2.subtract(img_mark, img_seg) # Convert the mask to grey and threshold it mask_grey = cv2.cvtColor(mask_dif, cv2.COLOR_BGR2GRAY) ret, mask1 = cv2.threshold(mask_grey, 200, 255,0) mask[mask1== 255] = 1 mask, bgdModel, fgdModel = cv2.grabCut(img, mask, None, bgdModel, fgdModel, 5, cv2.GC_INIT_WITH_MASK) mask_final = np.where((mask==2)|(mask==0),0,1).astype('uint8') img_out = img*mask_final[:,:,np.newaxis] cv2.imshow('a', img_out) cv2.waitKey(0) |
In the next blog, we will see an interactive implementation of the grabcut algorithm. Hope you enjoy reading.
Referenced Research Paper: “GrabCut”: interactive foreground extraction using iterated graph cuts
If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time