In this post, I will explain an Artificial Neural (ANN) Network Architecture known as Long Short Term Memory (LSTM). Basically, it is a type of Recurrent Neural Network (RNN).
Comparing Different Types of Artificial Neural Networks (ANNs)
Before discussing LSTM, let us first understand the difference between a traditional Artificial Neural Network (ANN), and a Recurrent Neural Network (RNN). Since ANN processes the input in only a forward direction, it doesn’t learn from errors. Also, they have frequent unexplained behavior. Therefore, Artificial Neural Networks (ANN) are suitable for storing information. In contrast, a Backpropagation ANN has an error function that computes the gradient of the error function for the weights of the network.
Backpropagation ANN allows the propagation of error backward to the hidden layers of the network and forces the adjustment in weights. This how a backpropagation ANN learns.
Another type of Artificial Neural Network is known as a Recurrent Neural Network (RNN). Basically, RNNs are the self-learning networks. They are suitable when the data has a sequential pattern such as text or speech. The RNNs recursively feed their outputs to the inputs. Hence, the current output depends upon the previous outputs in RNN.
The drawback of Recurrent Neural Network (RNN)
Basically, RNNs suffer from a problem known as Long Term Dependency Problem. In other words, the gap between the past information learned and the current task may become very wide. Since Recurrent Neural Networks uses backpropagation for learning, the partial derivative of the error is computed and fed back to the network in order to adjust the weights.
Whenever this partial derivative becomes very small and multiplied to a small learning rate, the resulting quantity becomes too small. It actually vanishes. As a result, no further change in weights occurs and the learning stops. This situation is known as Vanishing Gradience. In order to overcome the problem of Vanishing Gradience, Long Short Term Memory (LSTM) is created.
Long Short Term Memory (LSTM)
For the purpose of avoiding the problem of vanishing gradients, the LSTM network maintains a state known as the Cell State in the network. Each cell in LSTM has gates that control the flow of information and determine what information is remembered and what is discarded. Consequently, a cell has the following gates.
- Forget Gate takes the output from the previous state and determines which information should be transferred ahead and makes use of a sigmoid function in order to filter the information.
- Input Gate adds new information from the current input vector.
- The Output Gate makes use of the sigmoid function in order to determine what value should be provided as the output.
This article on Long Short Term Memory – An Artificial Recurrent Neural Network Architecture describes the different variants of the Artificial Neural Networks (ANN) and compares them. Further, the Recurrent Neural Network (RNN) is also described in brief and the Problem of Vanishing Gradience is explained here. Finally, the Long Short Term Memory architecture is described.