Towards human-like perception with connectionist models

Gopalakrishnan, Anand

Back

Doctoral thesis

Towards human-like perception with connectionist models

Gopalakrishnan, Anand
Schmidhuber, Jürgen (Degree supervisor)

2025

PhD: Università della Svizzera italiana

English The last decade has seen rapid progress in Artificial Intelligence (AI) capabilities across domains such as game-playing, language understanding, multimedia generation, scientific applications etc. with neural networks (NNs) being the centerpiece. Large NNs trained on internet-scale datasets (Foundation Models) have led to breakthrough task performance across vision, audio and language modalities. However, these Foundation Models continue to struggle on tasks that require a fine-grained understanding of inputs—objects and relations, suggesting a gap in their ability to comprehend the underlying structure in sensory inputs compared to humans. Such failures can be seen as manifestations of an inability to form and relate symbol-like representations of objects, i.e. resolve the binding problem. This dissertation focuses on this challenge of grouping sensory inputs into modular (object-centric) representations using NNs with no supervision. In the first half, we develop a new method to discovery object keypoints that are more robust to distractors and alleviate certain systematic biases of previous methods. We extend slot-based models typically designed to spatially group pixels to visual objects to temporally group state-action sequences as sub-routines. We develop new masking and decoding methods to enable each slot to model contiguous input elements. The second half focuses on synchrony-based models, an alternate class of object-centric models to slots, that use phase-components of complex-valued activations to store object bindings. We design new contrastive training procedures for synchrony-based models which improve phase synchronization and object storage capacity. We refine these ideas, by simplifying the inductive biases and training process of synchrony-based models using complex-valued weights and recurrent computation. Lastly, we show how complex-valued activations allows a natural decoupling of content and position-based matches and design a powerful relative position encoding scheme for Transformer models. Broadly, we believe that a key component of human-level intelligence is our ability to construct abstract mental models of the world by composing structured primitives. We hope this thesis contributes towards the grand challenge of endowing machines with the same capacity.

Collections

USI Faculty of Informatics

Language

English

Classification

Computer science and technology

License

License undefined

Open access status

green

Identifiers

NDP-USI 2025INF017
ARK ark:/12658/srd1334461
URN urn:nbn:ch:rero-006-123599

Persistent URL

https://n2t.net/ark:/12658/srd1334461

Statistics

Document views: 99 File downloads:

2025INF017.pdf: 274

Doctoral thesis

Towards human-like perception with connectionist models

Artificial intelligence

Neural networks

Representation learning

Object perception

Binding

Statistics