Machine Learning Quiz-1

Q1. Let say if you have 10 million dataset, and it would take 2 week time to train your model. Which of the following statement do you most agree with?

  1. If you have already trained a model with different dataset and is performing well with 98% dev accuracy on that dataset, you just use that model instead of training on current dataset for two weeks
  2. If 10 million dataset is enough to build a good model, you might be better off training with just 1 million dataset to gain 10 times improvement in how quickly you can run experiments, even if each model performs a bit worse because it’s trained on less data
  3. You will go with complete dataset and run the model for two weeks to see the first results
  4. All of the above

Answer: 2
Explanation: In Machine learning, the best approach is to build an initial model quickly using a random subset of data and then use the Bias/Variance analysis and error analysis to priortize next steps.

Q2. In a Multi-layer Perceptron (MLP), each node is connected to all the previous layer nodes?

  1. True
  2. False

Answer: 1
Explanation: Since a Multi-Layer Perceptron (MLP) is a Fully Connected Network, each node in one layer connects with a certain weight to every node in the following layer.

Q3. Identify the following activation function : g(z) = (exp(z) – exp(-z))/(exp(z) + exp(–z))?

  1. Tanh activation function
  2. Sigmoid activation function
  3. ReLU activation function
  4. Leaky ReLU activation function

Answer: 1
Explanation: This refers to Tanh activation function. Similar to sigmoid, the tanh function is continuous and differentiable at all points, the only difference is that it is symmetric around the origin. Refer to this beautiful explanation by Andrew Ng to know more.

Q4. Suppose we have a neural network having 10 nodes in the input layer, 5 nodes in the hidden layer and 1 node in the output layer. What will be the dimension of W1 (first layer weights) and W2 (second layer weights)?

  1. W1:5×1, W2:1×1
  2. W1:1×10, W2:1×5
  3. W1:1×5, W2:5×10
  4. W1:5×10, W2:1×5

Answer: 4
Explanation: Generally, the weights dimensions for a layer is (next layer nodes x previous layer nodes) so the answer is W1:5×10, W2:1×5. Refer to this beautiful explanation by Andrew Ng to know more.

Q5. In Dropout, What will happen if we increasing the Dropout rate from (say) 0.5 to 0.8?

  1. Reducing the regularization effect.
  2. Causing the neural network to end up with a lower training set error.
  3. Both of the above.
  4. None of the above.

Answer: 3
Explanation:

Q6. Finding good hyperparameter values is very time-consuming. So typically you should do it once at the start of the project, and try to find very good hyperparameters so that you don’t ever have to revisit tuning them again.

  1. True
  2. False

Answer: 2
Explanation: You can’t really know beforehand which set of hyperparameters will work best for your case. You need to follow the iterative process of Idea->Code->Eperiment.

Q7. In a deep neural network, what is the general rule for the dimensions of weights and biases of layer l? Where n is the number of units in layer l.

  1. w[l] : (n[l], n[l])
    b[l] : (n[l], 1)
  2. w[l] : (n[l+1], n[l])
    b[l] : (n[l-1], 1)
  3. w[l] : (n[l], n[l-1])
    b[l] : (n[l], 1)
  4. w[l] : (n[l], n[l-1])
    b[l] : (n[l-1], 1)

Answer: 3
Explanation: The dimensions of weights of layer l is given by (n[l], n[l-1]) and biases is given by (n[l], 1). Refer to this beautiful explanation by Andrew Ng to know more.

Q8. Which of the following method can be used for hyperparameter tuning?

  1. Random Search
  2. Grid Search
  3. Bayesian optimization
  4. All of the above.

Answer: 4
Explanation: All of the above methods can be used for hyperparameter tuning.

Leave a Reply