Q1. Which of the following object detection networks uses a ROI Pooling layer?
R-CNN
Fast R-CNN
YOLO
All of the above
Answer: 2 Explanation: Out of the above mentioned networks, only Fast R-CNN uses a ROI Pooling layer. Becuase of this, Fast R-CNN can take any size image as input as compared to R-CNN where we need to resize region proposals before passing into CNN. Refer to this research paper to understand more.
Q2. Which of the following techniques can be used to reduce the number of channels/feature maps?
Pooling
Padding
1×1 convolution
Batch Normalization
Answer: 3 Explanation: 1×1 convolution can be used to reduce the number of channels/feature maps. Refer to this beautiful explanation by Andrew Ng to understand more.
Q3. Which of the following networks has the fastest prediction time?
R-CNN
Fast R-CNN
Faster R-CNN
Answer: 3 Explanation: As clear from the name, Faster R-CNN has the fastest prediction time. Refer to this research paper to understand more.
Q4. Max-Pooling makes the Convolutional Neural Network translation invariant (for small translations of the input)?
True
False
Answer: 1 Explanation: According to Ian Goodfellow, Max pooling achieves partial invariance to small translations because the max of a region depends only on the single largest element. If a small translation doesn’t bring in a new largest element at the edge of the pooling region and also doesn’t remove the largest element by taking it outside the pooling region, then the max doesn’t change.
Q5. What do you mean by the term “Region Proposals” as used in the R-CNN paper?
regions of an image that could possibly contain an object of interest
regions of an image that could possibly contain information other than the object of interest
final bounding boxes given by the R-CNN
Answer: 1 Explanation: As clear from the name, Region Proposals are a set of candidate regions that could possibly contain an object of interest. These region proposals are then fed to a CNN which extracts features from each of these proposals and these features are then fed to a SVM classifier to determine what type of object (if any) is contained within the proposal. The main reason behind extracting these region proposals beforehand is that instead of searching the object at all image locations, we should search for only those locations where there is a possibility of object. This will reduce the false positives as we are only searching in the regions where there is a possibility of having an object. Refer to this research paper to understand more.
Q6. Because Pooling layer has no parameters, they don’t affect the gradient calculation during backpropagation?
True
False
Answer: 2 Explanation: It is true that Pooling layer has no parameters and hence no learning takes place during backpropagation. But it’s wrong to say that they don’t affect the gradient calculation during backpropagation because pooling layer guides the gradient to the right input from which the Pooling output came from. Refer to this link to know more.
Q7. Which of the following techniques was used by Traditional computer vision object detection algorithms to locate objects in images at varying scales and locations?
image pyramids for varing scale and sliding windows for varing locations
image pyramids for varing locations and sliding windows for varing scale
Answer: 1 Explanation: Becuase an object can be of any size and can be present at any location, so for object detection we need to search both at different locations and scales. As we know that by using image pyramids (multi-resolution representations for images) we can handle scale dependency and for locations we can use sliding window. So, traditional computer vision algorithms use these for object detection. For instance, refer to the Overfeat paper that shows how a multiscale and sliding window approach can be efficiently implemented within a ConvNet.
Q8. How do you introduce non-linearity in a Convolutional Neural Network (CNN)?
Using ReLU
Using a Max-Pooling layer
Both of the above
None of the above
Answer: 3 Explanation: Non-linearity can be introduced by either using ReLU (non-linear activation function) or by using a Max-Pooling layer (as max is a non-linear function).