Alex Graves - Multi-Dimensional Recurrent Neural Networks (2007)

History / Edit / PDF / EPUB / BIB /
Created: June 16, 2017 / Updated: March 22, 2020 / Status: finished / 3 min read (~447 words)

  • The basic idea of MDRNNs is to replace the single recurrent connection found in standard RNNs with as many recurrent connections as there are dimensions in the data
  • The data must be processed in such a way that when the network reaches a point in an n-dimensional sequence, it has already passed through all the points from which it will receive its previous activations

  • BRNNs (Bidirectional RNNs) contain two separate hidden layers that process the input sequence in the forward and reverse directions. The two hidden layers are connected to a single output layer, thereby providing the network with access to both past and future context
  • BRNNs can be extended to n-dimensional data by using $2^n$ separate hidden layers, each of which processes the sequence using the ordering defined (the network must have passed through all the points from which it will receive its previous activations), but with a different choice of axes. More specifically, the axes are chosen so that their origins lie on the $2^n$ vertices of the sequence

  • For standard RNN architectures, the range of context that can practically be used is limited. The problem is that the influence of a given input on the hidden layer, and therefore on the network output, either decays or blows up exponentially as it cycles around the network's recurrent connections. This is usually referred to as the vanishing gradient problem

  • Multi-directional MDRNN with 4 LSTM hidden layers
  • Each layer consists of 25 memory blocks, each containing 1 cell, 2 forget gates, 1 input gate, 1 output gate and 5 peephole weights
  • Input is size 3 (red, green, blue component of the pixels)
  • Output is size 155 (one unit for each textural class)
  • Input and output activation functions are tanh, and the activation function for the gates is the logistic sigmoid
  • Softmax activation at the output layer, with cross-entropy objective function

pixels
(
-> 25LSTM(2 forget, 1 input, 1 output, 5 peephole, tanh)
-> 25
LSTM(2 forget, 1 input, 1 output, 5 peephole, tanh)
-> 25LSTM(2 forget, 1 input, 1 output, 5 peephole, tanh)
-> 25
LSTM(2 forget, 1 input, 1 output, 5 peephole, tanh)
)
-> softmax

  • We carried out a slightly modified task where each pixel was classified according to the digit it belonged to, with an additional class for background pixels
  • The results on the warped MNIST data set suggests that MDRNNs are more robust to input warping than convolution networks

  • One benefit of two dimensional tasks is that the operation of the network can be easily visualized

  • Graves, Alex, Santiago Fernández, and Jürgen Schmidhuber. "Multidimensional recurrent neural networks." Proceedings of the international conference on artificial neural networks. 2007.