Building the architecture
Each neural network consists of three sets of layers—input, hidden, and output. There is always one input and one output layer. If the neural network is deep, it has multiple hidden layers:
The difference between an RNN and the standard feedforward network comes in the cyclical hidden states. As seen in the following diagram, recurrent neural networks use cyclical hidden states. This way, data propagates from one time step to another, making each one of these steps dependent on the previous:
A common practice is to unfold the preceding diagram for better and more fluent understanding. After rotating the illustration vertically and adding some notations and labels, based on the example we picked earlier (generating a new chapter based on The Hunger Games books), we end up with the following diagram:
This is an unfolded RNN with one hidden layer. The identically looking sets of (input + hidden RNN unit + output) are actually the different time steps (or cycles) in the RNN. For example, the combination of + RNN + illustrates what is happening at time step . At each time step, these operations perform as follows:
- The network encodes the word at the current time step (for example, t-1) using any of the word embedding techniques and produces a vector (The produced vector can be or depending on the specific time step)
- Then, , the encoded version of the input word I at time step t-1, is plugged into the RNN cell (located in the hidden layer). After several equations (not displayed here but happening inside the RNN cell), the cell produces an output and a memory state . The memory state is the result of the input and the previous value of that memory state . For the initial time step, one can assume that is a zero vector
- Producing the actual word (volunteer) at time step t-1 happens after decoding the output using a text corpus specified at the beginning of the training
- Finally, the network moves multiple time steps forward until reaching the final step where it predicts the word
You can see how each one of {…, , , , …} holds information about all the previous inputs. This makes RNNs very special and really good at predicting the next unit in a sequence. Let's now see what mathematical equations sit behind the preceding operations.