Doctoral thesis

New architectures for very deep learning


116 p

Thèse de doctorat: Università della Svizzera italiana, 2018

English Artificial Neural Networks are increasingly being used in complex real- world applications because many-layered (i.e., deep) architectures can now be trained on large quantities of data. However, training even deeper, and therefore more powerful networks, has hit a barrier due to fundamental limitations in the design of existing networks. This thesis develops new architectures that, for the first time, allow very deep networks to be optimized efficiently and reliably. Specifically, it addresses two key issues that hamper credit assignment in neural networks: cross-pattern interference and vanishing gradients. Cross- pattern interference leads to oscillations of the network’s weights that make training inefficient. The proposed Local Winner-Take-All networks reduce interference among computation units in the same layer through local competition. An in-depth analysis of locally competitive networks provides generalizable insights and reveals unifying properties that improve credit assignment. As network depth increases, vanishing gradients make a network’s outputs increasingly insensitive to the weights close to the inputs, causing the failure of gradient-based training. To overcome this limitation, the proposed Highway networks regulate information flow across layers through additional skip connections which are modulated by learned computation units. Their beneficial properties are extended to the sequential domain with Recurrent Highway Networks that gain from increased depth and learn complex sequential transitions without requiring more parameters.
  • English
Computer science
License undefined
Persistent URL

Document views: 43 File downloads:
  • 2018INFO006.pdf: 8