Recurrent Neural Networks with Python Quick Start Guide
上QQ阅读APP看书,第一时间看更新

Building the architecture 

Each neural network consists of three sets of layers—input, hidden, and output. There is always one input and one output layer. If the neural network is deep, it has multiple hidden layers:

The difference between an RNN and the standard feedforward network comes in the cyclical hidden states. As seen in the following diagram, recurrent neural networks use cyclical hidden states. This way, data propagates from one time step to another, making each one of these steps dependent on the previous:

A common practice is to unfold the preceding diagram for better and more fluent understanding. After rotating the illustration vertically and adding some notations and labels, based on the example we picked earlier (generating a new chapter based on The Hunger Games books), we end up with the following diagram:

This is an unfolded RNN with one hidden layer. The identically looking sets of (input + hidden RNN unit + output) are actually the different time steps (or cycles) in the RNN. For example, the combination of  + RNN +  illustrates what is happening at time step  . At each time step, these operations perform as follows:

  1. The network encodes the word at the current time step (for example, t-1) using any of the word embedding techniques and produces a vector  (The produced vector can be  or  depending on the specific time step)
  2. Then, , the encoded version of the input word I at time step t-1, is plugged into the RNN cell (located in the hidden layer). After several equations (not displayed here but happening inside the RNN cell), the cell produces an output  and a memory state . The memory state is the result of the input  and the previous value of that memory state . For the initial time step, one can assume that  is a zero vector
  3. Producing the actual word (volunteer) at time step t-1 happens after decoding the output  using a text corpus specified at the beginning of the training 
  4. Finally, the network moves multiple time steps forward until reaching the final step where it predicts the word

You can see how each one of {…, , …} holds information about all the previous inputs. This makes RNNs very special and really good at predicting the next unit in a sequence. Let's now see what mathematical equations sit behind the preceding operations.

Text corpus—an array of all words in the example vocabulary.