Self-adaptivity of applications on network on chip multiprocessors

Derin, Onur

Back

Doctoral thesis

Self-adaptivity of applications on network on chip multiprocessors : the case of fault-tolerant Kahn process networks

Derin, Onur
Sami, Mariagiovanna (Degree supervisor)

19.05.2015

185 p

Thèse de doctorat: Università della Svizzera italiana, 2015

Multiprocessor systems-on-chips

English Technology scaling accompanied with higher operating frequencies and the ability to integrate more functionality in the same chip has been the driving force behind delivering higher performance computing systems at lower costs. Embedded computing systems, which have been riding the same wave of success, have evolved into complex architectures encompassing a high number of cores interconnected by an on-chip network (usually identified as Multiprocessor System-on-Chip). However these trends are hindered by issues that arise as technology scaling continues towards deep submicron scales. Firstly, growing complexity of these systems and the variability introduced by process technologies make it ever harder to perform a thorough optimization of the system at design time. Secondly, designers are faced with a reliability wall that emerges as age-related degradation reduces the lifetime of transistors, and as the probability of defects escaping post-manufacturing testing is increased. In this thesis, we take on these challenges within the context of streaming applications running in network-on-chip based parallel (not necessarily homogeneous) systems-on-chip that adopt the no-remote memory access model. In particular, this thesis tackles two main problems: (1) fault-aware online task remapping, (2) application-level self-adaptation for quality management. For the former, by viewing fault tolerance as a self-adaptation aspect, we adopt a cross-layer approach that aims at graceful performance degradation by addressing permanent faults in processing elements mostly at system-level, in particular by exploiting redundancy available in multi-core platforms. We propose an optimal solution based on an integer linear programming formulation (suitable for design time adoption) as well as heuristic-based solutions to be used at run-time. We assess the impact of our approach on the lifetime reliability. We propose two recovery schemes based on a checkpoint-and-rollback and a rollforward technique. For the latter, we propose two variants of a monitor-controller- adapter loop that adapts application-level parameters to meet performance goals. We demonstrate not only that fault tolerance and self-adaptivity can be achieved in embedded platforms, but also that it can be done without incurring large overheads. In addressing these problems, we present techniques which have been realized (depending on their characteristics) in the form of a design tool, a run-time library or a hardware core to be added to the basic architecture.

Language

English

Classification

Computer science and technology

License

License undefined

Identifiers

RERO DOC 257421
URN urn:nbn:ch:rero-006-114602
ARK ark:/12658/srd1318454

Persistent URL

https://n2t.net/ark:/12658/srd1318454

Statistics

Document views: 436 File downloads:

Texte intégral: 305

Doctoral thesis

Self-adaptivity of applications on network on chip multiprocessors : the case of fault-tolerant Kahn process networks

Fault tolerance

High performance computing

Kahn process networks

Multiprocessor systems-on-chips

Networks-on-chips

Optimization

Polyhedral process networks

Reliability

Run-time management

Self-adaptation

Statistics