Doctoral thesis

Strategies for practical deep learning

  • 2024

PhD: Università della Svizzera italiana

English In recent years, deep neural networks have achieved breakthrough results in diverse domains, from computer vision and natural language processing to game playing and life sciences. However, harnessing the full power of this technology in practical applications remains challenging. In this thesis, we explore strategies to address the challenges of applying deep learning to real-world pattern recognition problems. We tackle multiple practical problems, such as Optical Music Recognition (OMR), automated machine learning (AutoML), or the design of robust neural network architectures. In the context of OMR, we introduce two datasets, DeepScores and DeepScoresV2, the largest and most complete OMR datasets to date. Based on this data, we develop the first object detection method capable of handling the challenges of written music and methods to harden neural networks against the effects of degraded real-world data more than doubling detection performance on messy, degraded data. We then investigate the current state of AutoML, introduce a novel method for AutoML and extract design patterns for resource-constrained AutoML settings. In the latter parts of this thesis, we focus on the underlying issues that often cause neural networks to generalize poorly to real-world data. We first investigate the dataset dependency of modern CNN architectures. We show through an extensive empirical study that ImageNet alone is insufficient to judge the power of CNN architectures and propose strategies for developing more universal evaluation methods. Finally, we tackle the lack of rotation invariance in modern vision systems and introduce a novel bio-inspired paradigm that significantly enhances the rotational robustness and outperforms the current state of the art by 19%.
Collections
Language
  • English
Classification
Computer science and technology
License
License undefined
Open access status
green
Identifiers
Persistent URL
https://n2t.net/ark:/12658/srd1330766
Statistics

Document views: 14 File downloads:
  • 2024INF017.pdf: 20