Learning structured neural representations for visual reasoning tasks

van Steenkiste, Sjoerd

Back

Doctoral thesis

Learning structured neural representations for visual reasoning tasks

van Steenkiste, Sjoerd
Schmidhuber, Jürgen (Degree supervisor)

04.11.2020

236 p

Thèse de doctorat: Università della Svizzera italiana, 2020

English Deep neural networks learn representations of data to facilitate problem-solving in their respective domains. However, they struggle to acquire a structured representation based on more symbolic entities, which are commonly understood as core abstractions central to human capacity for generalization. This dissertation studies this issue for visual reasoning tasks. Inspired by how humans solve these tasks, we propose to learn structured neural representations that distinguish objects: abstract visual building blocks that can separately be composed and reasoned with. We investigate the limitations of current deep neural networks at effectively discovering, representing, and relating these more symbolic entities, and present several improvements. To address the problem of discovering and representing objects, we propose two novel approaches. In one case, we formalize this problem as a pixel-level clustering problem and formulate a neural differentiable clustering algorithm that solves it. We demonstrate how, unlike standard representation learning techniques, it can be trained to learn about objects in an unsupervised manner and acquire corresponding representations that can be treated as symbols for reasoning. In the other case, we adopt a purely generative approach and demonstrate how a neural network equipped with the right inductive bias can learn about objects in the process of synthesizing images, even in complex visual settings. Concerning the problem of relating symbolic entities with neural networks, we investigate how object representations can help facilitate building structured models for common-sense physical reasoning that generalize more systematically. We extend our previous representation learning approach to facilitate model building in this way and demonstrate how it can learn about general relations between objects to reason about their (future) physical interactions. Finally, we investigate the utility of a representational format that isolates independent sources of information for encoding the features of individual objects. We conduct a large-scale study of such 'disentangled' representations that includes various methods and metrics on two new abstract visual reasoning tasks. Our results indicate that better disentanglement enables quicker learning using fewer samples.

Language

English

Classification

Computer science and technology

License

License undefined

Identifiers

RERO DOC 329836
URN urn:nbn:ch:rero-006-119032
ARK ark:/12658/srd1319286

Persistent URL

https://n2t.net/ark:/12658/srd1319286

Statistics

Document views: 356 File downloads:

2020INFO019.pdf: 692

Doctoral thesis

Learning structured neural representations for visual reasoning tasks

Artificial intelligence

Deep learning

Neural networks

Representation learning

Reasoning

Objects

Neuro-symbolic AI

Vision

Binding problem

Statistics