Computer Vision Quiz-2

Q1. Suppose we have an image of size 4×4 and we apply the Max-pooling with a filter of size 2×2 and a stide of 2. The resulting image will be of size:

  1. 2×2
  2. 2×3
  3. 3×3
  4. 2×4

Answer: 1
Explanation: Because in Max-pooling, we take the maximum value for each filter location so the output image size will be 2×2 (the number of filter locations). Refer to this beautiful explanation by Andrew Ng to understand more.

Q2. In Faster R-CNN, which loss function is used in the bounding box regressor?

  1. L2 Loss
  2. Smooth L1 Loss
  3. Log Loss
  4. Huber Loss

Answer: 2
Explanation: In Faster R-CNN, Smooth L1 loss is used in the bounding box regressor. This is a robust L1 loss that is less sensitive to outliers than the L2 loss used in R-CNN and SPPnet. Refer to Section 3.1.2 of this research paper to understand more.

Q3. For binary classification, we generally use ________ loss function?

  1. Binary crossentropy
  2. mean squared error
  3. mean absolute error
  4. ctc

Answer: 1
Explanation: For binary classification, we generally use Binary crossentropy loss function. Refer to this beautiful explanation by Andrew Ng to understand more.

Q4. How do we perform the convolution operation in computer vision?

  1. we multiply the filter weights with the corresponding image pixels, and then sum these up
  2. we multiply the filter weights with the corresponding image pixels, and then subtract these up
  3. we add the filter weights and the corresponding image pixels, and then multiply these up
  4. we add the filter weights with the corresponding image pixels, and then sum these up

Answer: 1
Explanation: In Convolution, we multiply the filter weights with the corresponding image pixels, and then sum these up.

Q5. In a Region Proposal Network (RPN), what is used in the last layer for calculating the objectness scores at each sliding window position?

  1. Softmax
  2. Linear SVM
  3. ReLU
  4. Sigmoid

Answer: 1
Explanation: In a Region Proposal Network (RPN), the authors of Faster R-CNN paper uses a 2 class softmax layer for calculating the objectness scores for each proposal at each sliding window position.

Q6. In R-CNN, the regression model outputs the actual absolute coordinates of the bounding boxes?

  1. Yes
  2. No

Answer: 2
Explanation: In R-CNN, the regression model outputs the deltas or the relative coordinate change of the bounding boxes instead of absolute coordinates. Refer to Appendix C of this research paper to understand more.

Q7. Is Dropout a form of Regularization?

  1. Yes
  2. No

Answer: 1
Explanation: Dropout, applied to a layer, consists of randomly dropping out(setting to zero) a number of output features of the layer during training. Because as any node can become zero, we can’t rely on any one feature so have to spread out the weights similar to regularization.

Q8. A fully convolutional network can be used for

  1. Image Segmentation
  2. Object Detection
  3. Image Classification
  4. All of the above

Answer: 4
Explanation: We can use a fully convolutional network for all of the above mentioned tasks. For instance, for image segmentation we have U-Net, for object detection we have YOLO etc.

Leave a Reply