Q1. The optimizer is an important part of training neural networks. which of the following is not the purpose of using optimizers?
Speed up algorithm convergence
Reduce the difficulty of manual parameter setting
Avoid overfitting
Avoid local extremes
Answer: 3 Explanation: To avoid overfitting, we use regularization and not optimizers.
Q2. Which of the following is not a regularization technique used in machine learning?
L1 regularization
R-square
L2 regularization
Dropout
Answer: 2 Explanation: Of all the above mentioned, R-square is not a regularization technique. R-squared is a statistical measure of how close the data are to the fitted regression line.
Q3. Which of the following are hperparameter in the context of deep learning?
Learning Rate, α
Momentum parameter, β1
Number of units in a layer
All of the above
Answer: 4 Explanation: According to Wikipedia, “In machine learning, a hyperparameter is a parameter whose value is used to control the learning process”. So, all of the above are hyperparameters.
Q4. Which of the following statement is not true with respect to batch normalization?
Batch normalization helps in decreasing training time
After using of batch normalization there is no need to use the dropout
Batch normalization helps in reducing the covariate shift
Answer: 3 Explanation: Although Batch Normalization has a slight regularization effect but this is not why we use this. This is used to make the neural network more robust (reduce covariate shift) and easy to train. While Dropout is used for regularization (reducing overfitting). So, the third option is incorrect.
Q5. In a machine learning project, modelling is an iterative process but deployment is not.
True
False
Answer: 2 Explanation: Deployment is an iterative process, where you should expect to make multiple adjustments (such as metrics monitored using dashboards or percentage of traffic served) to work towards optimizing the system.
Q6. Which of the following activation function works better for hidden layers?
Sigmoid
Tanh
Answer: 2 Explanation: The Tanh activation function usually works better than sigmoid activation function for hidden units because the mean of its output is closer to zero, so it centers the data better for the next layer and the gradients are not restricted to move in a certain direction.
Q7. The softmax function is used to calculate the probability distribution over a discrete variable with n possible values?
True
False
Answer: 1 Explanation: The softmax function is used to calculate the probability distribution over a discrete variable with n possible values. This can be seen as a generalization of the sigmoid function which was used to represent a probability distribution over a binary variable.
Q8. Let say you want to use the transfer learning from task A to task B. Which of the following scenario would support to use this transfer learning?
Task A and B have same input x
You have lot more data for task A than task B
Low level features from task A could be helpful for learning B
All of the above
Answer: 4
Explanation: All of the things mentioned above are pre-requisites for performing transfer learning. Refer to this beautiful explanation by Andrew Ng to know more.