Fast weight programmers for greater systematic generalisation in language

Schlag, Imanol

Back

Doctoral thesis

Fast weight programmers for greater systematic generalisation in language

Schlag, Imanol
Schmidhuber, Jürgen (Degree supervisor)

2023

PhD: Università della Svizzera italiana

English Over the past decade, deep neural network models have made significant advancements and achieved impressive results across various domains, including computer vision, natural language processing, and game playing. However, there is an ongoing debate questioning the ability of connectionist models to serve as a substrate for general AI due to their lack of systematicity, which continues to persist in modern deep learning models. To address this challenge, we propose the use of Fast Weight Programmers (FWPs) to enable structured representations and adaptive, context-specific computations. An FWP is a two-network system introduced in the early '90s, where a slow network with regular weights continuously updates the fast weights of a fast network. This makes the fast weights dependent on the context of the current input data, resulting in several benefits. In this work, we present novel neural architectures that build upon existing FWPs and contemporary neural networks to improve their systematicity. We also establish the formal equivalence of FWPs and linear Transformers, a variant of the Transformer architecture that linearises the attention mechanism for improved scalability. We demonstrate that modern FWP models can facilitate more structured representations and adaptive context-specific computation, leading to significant improvements in tasks such as question answering, machine translation, and natural language modelling.

Collections

USI Faculty of Informatics

Language

English

Classification

Computer science and technology

License

License undefined

Open access status

green

Identifiers

NDP-USI 2023INF009
URN urn:nbn:ch:rero-006-120500
ARK ark:/12658/srd1326257

Persistent URL

https://n2t.net/ark:/12658/srd1326257

Statistics

Document views: 232 File downloads:

2023INF009.pdf: 432

Doctoral thesis

Fast weight programmers for greater systematic generalisation in language

Machine learning

Artificial intelligence

Artificial neural networks

Language model

Fast weight programmer

Statistics