Q1. Suppose we have an image of size 4×4 and we apply the Max-pooling with a filter of size 2×2 and a stide of 2. The resulting image will be of size:
2×2
2×3
3×3
2×4
Answer: 1 Explanation: Because in Max-pooling, we take the maximum value for each filter location so the output image size will be 2×2 (the number of filter locations). Refer to this beautiful explanation by Andrew Ng to understand more.
Q2. In Faster R-CNN, which loss function is used in the bounding box regressor?
L2 Loss
Smooth L1 Loss
Log Loss
Huber Loss
Answer: 2 Explanation: In Faster R-CNN, Smooth L1 loss is used in the bounding box regressor. This is a robust L1 loss that is less sensitive to outliers than the L2 loss used in R-CNN and SPPnet. Refer to Section 3.1.2 of this research paper to understand more.
Q3. For binary classification, we generally use ________ loss function?
Binary crossentropy
mean squared error
mean absolute error
ctc
Answer: 1 Explanation: For binary classification, we generally use Binary crossentropy loss function. Refer to this beautiful explanation by Andrew Ng to understand more.
Q4. How do we perform the convolution operation in computer vision?
we multiply the filter weights with the corresponding image pixels, and then sum these up
we multiply the filter weights with the corresponding image pixels, and then subtract these up
we add the filter weights and the corresponding image pixels, and then multiply these up
we add the filter weights with the corresponding image pixels, and then sum these up
Answer: 1 Explanation: In Convolution, we multiply the filter weights with the corresponding image pixels, and then sum these up.
Q5. In a Region Proposal Network (RPN), what is used in the last layer for calculating the objectness scores at each sliding window position?
Softmax
Linear SVM
ReLU
Sigmoid
Answer: 1 Explanation: In a Region Proposal Network (RPN), the authors of Faster R-CNN paper uses a 2 class softmax layer for calculating the objectness scores for each proposal at each sliding window position.
Q6. In R-CNN, the regression model outputs the actual absolute coordinates of the bounding boxes?
Yes
No
Answer: 2 Explanation: In R-CNN, the regression model outputs the deltas or the relative coordinate change of the bounding boxes instead of absolute coordinates. Refer to Appendix C of this research paper to understand more.
Q7. Is Dropout a form of Regularization?
Yes
No
Answer: 1 Explanation: Dropout, applied to a layer, consists of randomly dropping out(setting to zero) a number of output features of the layer during training. Because as any node can become zero, we can’t rely on any one feature so have to spread out the weights similar to regularization.
Q8. A fully convolutional network can be used for
Image Segmentation
Object Detection
Image Classification
All of the above
Answer: 4 Explanation: We can use a fully convolutional network for all of the above mentioned tasks. For instance, for image segmentation we have U-Net, for object detection we have YOLO etc.