Neural Arithmetic Logic Units

In this tutorial, you will learn about neural arithmetic logic units (NALU).

You can see full TensorFlow implementation of neural arithmetic logic units in my GitHub Repository.

In Present time you can see neural network has wide application area from simple classification problem to complex self-driving cars problem. And neural network is doing very well in these fields. But can you think a neural network can’t count. Even animals as simple as bees can do that.

The problem is that a neural network can not perform numerical extrapolation outside training data. It will not be able to learn even a scalar identity function outside it’s training set. Recently DeepMind researchers released a paper in which they have generated a function which tries to solve this problem.

Failure Of Neural Network Learning a scalar identity function

The problem of neural nets not able to learn identity relations is not new. But in paper they tried to show this using an example.

They used an autoencoder of 3 layers each of 8 units and tried to predict identity relations. As an example if input is given as 4 then output should also be 4. They used different non-linear functions in this network like sigmoid and tanh but they all fail to extrapolate identity relation outside training data set.

They also saw that some highly linear function like PReLU are able to reduce the error but you can see even neural nets have function that are capable of extrapolation, they fail to do it.

To solve this problem they proposed two models:

  1. NAC (Neural Accumulator)
  2. NALU (Neural Arithmetic Logic Units)

NAC (Neural Accumulator)

Neural accumulator is able to solve problem of addition and subtraction.

NAC is a special case where transformation matrix of a layer consists of [-1, 0,1]. This makes output from W as addition or subtraction of rows of input vector rather than arbitrary re scaling produced by non-linear functions. As an example if our input layer consists of X1  and X2 then output from NAC will be linear combinations of input vectors. It will make numbers consistent throughout the model no matter haw many operations are applied there.

Since W is having hard constraints that every element of W should be one of {-1, 0, 1}. It makes learning difficult. The problem of difficult learning is that hard constraints creates difficulty in updating weights during back propagation. To solve this they proposed a continuous and differentiable parameterization of W.

 

w_hat and m_hat are randomly initialized weights and be convenient to learn with gradient descent. This guarantees that W will be in range of {-1, 1} and be close to {-1 , 0 ,1}. Here ” * ” means element-wise multiplication.

NALU (Neural Arithmetic Logic Units)

NAC is able to solve problem of addition/subtraction but also to solve problem of multiplication/division they came up with NALU which consists of two NAC sub cells, one capable of addition/subtraction and other for multiplication/division.

It consists of these five equations:

Where,

  1.  w_hat, m_hat and G and randomly initialized weights,
  2.  ϵ is used to get away with problem of log(0),
  3.  x and y are input and output layer respectively,
  4. g is the gate which will be between 0 and 1.

Here the concept of gate is being added as variable ” g ” , such that if output value is applied with g = 1(on) then multiply/divide sub cell is 0 (off) and vice-versa.

For addition and subtraction (a = matmul(x, W) ), is identical to original NAC while for multiply/divide NAC operates in log space and capable of learning to multiply and divide (m = exp(matmul(W , log(|x|+ ϵ ) ) ).

So, this NALU is capable of both extrapolation and interpolation.

Experiments performed with NAC and NALU models

In paper they have also applied these concepts over different task to see ability of NAC and NALU. They found NALU to be very useful in different problems like:

  1. Learning tasks using different arithmetic functions( x+y, x-y, x-y+x, x*y, etc)
  2. Counting Task using recurrent network in which images of different digits are being fed to model and output should count no of each different type of digits.
  3. Language to number translation task in which expression like ” five hundred fifteen ” is being fed to network and output should return ” 515 “. Here NALU is applied with a LSTM model in output layer.
  4. NALU is also used with reinforcement learning to track time in a grid-world environment.

Summary

We have seen that NAC and NALU can be applied to overcome problem of failure of numerical representation to generalize outside the range observed in training data set. If you have gone through this blog, you have seen that this NAC and NALU concept is very easy to grasp and apply. However, it can not be said that NALU will be perfect for every task, so we have to see where it is giving good results.

Referenced Research Paper : Neural Arithmetic Logic Units

 

 

 

 

 

Leave a Reply