On Calibration of Modern Neural Networks

Nowadays neural networks are having vast applicability and these are trusted to make complex decisions in applications such as, medical diagnosis, speech recognition, object recognition and optical character recognition. Due to more and more research in deep learning, neural networks accuracy has been improved dramatically.

With the improvement in accuracy, neural network should also be confident in saying when they are likely to be incorrect. As an example, if confidence given by a neural network for disease diagnosis is low, control should be passed to human doctors.

Now what is confidence score in neural network? It is the probability estimate produced by the neural network. Let say, you are working on a multi-class classification task. After applying softmax layer you found out that a particular class is having highest probability with value of 0.7 . It means that you are 70% confident that this should be your actual output.

Here we intuitively mean that, for 100 predictions if average confidence score is 0.8, 80 should be correctly classified. But modern neural networks are poorly calibrated. As you can see in figure there is larger gap between average confidence score and accuracy for ResNet while less for LeNet.

In the paper, author has addresses the followings:

What methods are alleviating poor calibration problem in neural networks.
A simple and straightforward solution to reduce this problem.

Observing Miscalibration:

With the advancement in deep neural networks some recent changes are responsible for miscalibration.

Model Capacity: Although increasing depth and width of neural networks may reduce classification error, but in paper they have observed that these increases negatively affect model calibration.
Batch Normalization: Batch Normalization improves training time, reduces the need for additional regularization, and can in some cases improve the accuracy of networks. It has been observed that models trained with Batch Normalization tend to be more miscalibrated.
Weight Decay: It has been found that that training with less weight decay has a negative impact on calibration.

Temperature Scaling:

Temperature scaling works well to calibrate computer vision models. It is a simplest extension of Platt scaling. To understand temprature scaling we will first see Platt scaling.

Platt Scaling: This method is used for calibrating models. It uses logistic regression to return the calibrated probabilities of a model. Let say you are working on a multi-class classification task and trained it on some training data. Now Platt scaling will take logits(output from trained network before applying softmax layer using validation dataset) as input to logistic regression model. Then Platt scaling will be trained on validation dataset and learns scalar parameters a, b ∈ R and outputs q = σ(az + b) as the calibrated probability(where z are logits.).

Temperature scaling is an extension of Platt scaling having a trainable single parameter T>0 for all classes. T is called the temperature. T is trained with validation dataset not on training dataset. Because if we train T during training, network would learn to make the temperature as low as possible so that it can be very confident on training dataset.

Then temperature will be applied directly to softmax layer by dividing logits with T ( z/T ) and then trained on validation dataset. After adjusting temperature parameter on validation dataset, it will give trained parameter T, which we can use to divide logits and then apply softmax layer to find calibrated probabilities during test data. Now, lets see a simple TensorFlow code to implement temperature scaling.

X = tf.placeholder(np.float32, shape=[None, 10])  # logits
Y = tf.placeholder(np.float32, shape=[None, 10])  # actual output
temp = tf.get_variable("temp", shape=[1], initializer=tf.initializers.constant(1.0)) # T temperature parameter
logits_w_temp = tf.divide(X, temp)  # logits after dividing with T

loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=Y, logits=logits_w_temp))  # loss function with softmax
optimizer = tf.train.AdamOptimizer(learning_rate=0.01).minimize(loss)
init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init) 
# input_data are the logits output of validation data using trained network
# labels are actual output of validation dataset
    for epoch in range(100):
        _, c, t = sess.run([optimizer, loss, temp], feed_dict={X: input_data, Y: labels})

        print("Epoch:", (epoch + 1), "cost =", "{:.15f}".format(c))
        print("Epoch:", (epoch + 1), "temprature =", t)

    X_test = input_data

    predictions = sess.run(logits_w_temp, {X: X_test})

sess.close()

X = tf.placeholder(np.float32, shape=[None, 10]) # logits

Y = tf.placeholder(np.float32, shape=[None, 10]) # actual output

temp = tf.get_variable("temp", shape=[1], initializer=tf.initializers.constant(1.0)) # T temperature parameter

logits_w_temp = tf.divide(X, temp) # logits after dividing with T

loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=Y, logits=logits_w_temp)) # loss function with softmax

optimizer = tf.train.AdamOptimizer(learning_rate=0.01).minimize(loss)

init = tf.global_variables_initializer()

with tf.Session() as sess:

sess.run(init)

# input_data are the logits output of validation data using trained network

# labels are actual output of validation dataset

for epoch in range(100):

_, c, t = sess.run([optimizer, loss, temp], feed_dict={X: input_data, Y: labels})

print("Epoch:", (epoch + 1), "cost =", "{:.15f}".format(c))

print("Epoch:", (epoch + 1), "temprature =", t)

X_test = input_data

predictions = sess.run(logits_w_temp, {X: X_test})

sess.close()

Simple techniques can effectively remedy the miscalibration phenomenon in neural networks. Temperature scaling is the simplest, fastest, and most straightforward of the methods,and surprisingly is often the most effective.

Referenced Research Paper : On Calibration of Modern Neural Networks

GitHub: Temperature Scaling

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.

0 Shares

TheAILearner

Mastering Artificial Intelligence

On Calibration of Modern Neural Networks

Observing Miscalibration:

Temperature Scaling:

Leave a ReplyCancel reply