Stochastic additively preconditioned trust-region strategies for distributed neural network training
PhD: Università della Svizzera italiana
English
Training large-scale neural networks remains computationally expensive, largely due to the extensive hyperparameter tuning required by standard first-order optimizers like stochastic gradient descent (SGD) and Adam. This thesis addresses this bottleneck by adapting domain decomposition methods, traditionally used in scientific computing, to deep learning optimization. We propose the stochastic additively preconditioned trust-region strategy (SAPTS), a framework that integrates additive domain decomposition with trust-region optimization to improve numerical stability and reduce sensitivity to hyperparameter selection. We develop and formalize three SAPTS variants: one utilizing data parallelism and two employing parameter-space decomposition. These algorithms are implemented in PyTorch and evaluated across three diverse application domains: physics-informed neural networks (PINNs) for solving partial differential equations, image classification (MNIST and CIFAR-10), and sequential language modelling. Our empirical analysis characterizes the convergence behaviour, computational overhead, and scalability of SAPTS relative to established baselines. Results demonstrate that SAPTS is highly effective for physics-informed applications, achieving competitive performance with significantly less tuning than first-order methods. While SAPTS performs robustly on MNIST, we observe a performance gap in complex image tasks (CIFAR-10) and language modelling (TinyShakespeare), suggesting that highly nonconvex or sequential data structures present unique challenges for additive decomposition. We conclude that SAPTS is particularly well-suited for scientific machine learning and scenarios where tuning resources are constrained.
-
Collections
-
-
Language
-
-
Classification
-
Computer science and technology
-
License
-
-
Open access status
-
green
-
Identifiers
-
-
Persistent URL
-
https://n2t.net/ark:/12658/srd1335040