Doctoral thesis

Fast weight programmers for greater systematic generalisation in language

  • 2023

PhD: Università della Svizzera italiana

English Over the past decade, deep neural network models have made significant advancements and achieved impressive results across various domains, including computer vision, natural language processing, and game playing. However, there is an ongoing debate questioning the ability of connectionist models to serve as a substrate for general AI due to their lack of systematicity, which continues to persist in modern deep learning models. To address this challenge, we propose the use of Fast Weight Programmers (FWPs) to enable structured representations and adaptive, context-specific computations. An FWP is a two-network system introduced in the early '90s, where a slow network with regular weights continuously updates the fast weights of a fast network. This makes the fast weights dependent on the context of the current input data, resulting in several benefits. In this work, we present novel neural architectures that build upon existing FWPs and contemporary neural networks to improve their systematicity. We also establish the formal equivalence of FWPs and linear Transformers, a variant of the Transformer architecture that linearises the attention mechanism for improved scalability. We demonstrate that modern FWP models can facilitate more structured representations and adaptive context-specific computation, leading to significant improvements in tasks such as question answering, machine translation, and natural language modelling.
  • English
Computer science and technology
License undefined
Open access status
Persistent URL

Document views: 71 File downloads:
  • 2023INF009.pdf: 104