Machine Learning Quiz-2

Q1. Which of the following is a good choice for image related tasks such as Image classification or object detection?

  1. Multilayer Perceptron (MLP)
  2. Convolutional Neural Network (CNN)
  3. Recurrent Neural Network (RNN)
  4. All of the above

Answer: 2
Explanation: Convolutional Neural Network (CNN) is a good choice for image related tasks such as Image classification or object detection. There are two main reasons for this. First one is Parameter Sharing i.e. a feature detector that is useful in 1 part of image is probably useful in another part of the same image and because of this CNN has less parameters. Second one is Sparsity of connections i.e. in each layer, each output value depends only on small number of inputs (equal to the filter size).

Q2. Which of the following statement is correct?

  1. RMSprop divides the learning rate by an exponentially decaying average of squared gradients
  2. RMSprop divides the learning rate by an exponentially increasing average of squared gradients
  3. RMSprop has a constant learning rate
  4. RMSprop decays the learning rate by a constant value

Answer: 1
Explanation: The weights update equation in RMSprop is given by w=w-α*dw/(Sdw+e)^0.5 where Sdw is an exponentially weighted average (decaying function). Thus, RMSprop divides the learning rate by an exponentially decaying average of squared gradients. Refer to this beautiful explanation by Andrew Ng to know more.

Q3. _____ is a type of gradient descent which processes 1 training example per iteration?

  1. Stochastic Gradient Descent
  2. Batch Gradient Descent
  3. Mini-batch Gradient Descent
  4. None of the above.

Answer: 1
Explanation: Stochastic Gradient Descent processes 1 training example per iteration of gradient descent.

Q4. Let say you have trained a cat classifier on 10 million cat images and it is performing well on live environment. Now in live environment you have encountered new cat species. Due to that your deployed model has started degrading. You have only 1000 images of new indentifed cat species. Which of the following step you should take first?

  1. Put all 1000 images in the training set and start training asap
  2. Try data augmentation on these 1000 images to get more data
  3. Split the 1000 images into train/test set and start the training
  4. Use the data you have to define a new evaluation metric (using a new dev/test set) taking into account the new species, and use that to drive further progress with the model

Answer: 4
Explanation: Because we have a very less amount of data for new cat species (1000) as compared to 10 million so putting these 1000 in training or splitting will not make any difference. Also by augmentation we will not be able to increase the dataset to that extent (10 million). So the only option that left is build a new evaluation metric and penalize the model more for making false predictions on the new species.

Q5. Which of the following is an example of supervised learning?

  1. Given the data of house prices and house sizes, predict house price as a function of house size
  2. Given 50 spam and 50 non-spam emails, predict whether the new email is spam/non-spam
  3. Given the data consisting of 1000 images of cats and dogs each, we need to classify to which class the new image belongs
  4. All of the above

Answer: 4
Explanation: Because for each of the above options, we have the correct answer/label so all of the these are examples of supervised learning.

Q6. Which of the following is True for Structured Data?

  1. Structured Data has clear, definable relationships between the data points, with a pre-defined model containing it
  2. Structured data is quantitative, highly organized, and each of the feature has a well-defined meaning
  3. Structured data is generally contained in relational databases (RDBMS)
  4. All of the above

Answer: 4
Explanation: All of the above is True for Structured Data. Refer to this link to know more.

Q7. You have built a network using the sigmoid activation for all the hidden units. You initialize the weights to relative large values, using np.random.randn(..,..)*10000. What will happen?

  1. This will cause the inputs to the sigmoid to be very large, causing the units to be “highly activated” and thus speed up learning compared to if the weights had to start from small values
  2. It doesn’t matter as long as you initialize the weights randomly gradient descent is not affected by whether the weights are large or small
  3. This will cause the inputs to the sigmoid to be very large, thus causing gradients to also become large. You therefore have to set \alphaα to be very small to prevent divergence; this will slow down learning
  4. This will cause the inputs to the sigmoid to be very large, thus causing gradients to be close to zero and slows down the learning

Answer: 4
Explanation: When we initialize the weights to a very large value, the input to a sigmoid function (that is calculated using z=w*x+b) will also become very large. As we know that for large inputs the sigmoid curve is quite flat and because of this the gradients will be close to 0 and thus slows down the gradient descent or learning.

Q8. Let say you are working on a cat classifier, and have been asked to work on three different metrics. 1. accuracy 2. inference time and 3. memory size. What will you say about the following statement:\n”Having three evaluation metrics will make it easier for you to quickly choose between two different algorithms, and your team can work faster.”

  1. True
  2. False

Answer: 2
Explanation: It is always good to have a single real number evaluation metric. If you have more than 1 evaluation metric then it would be very difficult to access the performance. For instance, if for 1 case if the precision and recall is 60% and 40% while for other case precision and recall is 30% and 70% so it’s very tedious task to judge which one is better. That’s why we have F1 score as it combines precision and recall into one metric.

Leave a Reply