Self-adaptivity of applications on network on chip multiprocessors : the case of fault-tolerant Kahn process networks
      
      
        
      
      
      
      
      
      
      
      
      
      
      
      
        185 p
        
        
      
      
      
      
      
      
      
      Thèse de doctorat: Università della Svizzera italiana, 2015
      
      
      
      
      
      
      
       
      
      
      
        
        English
        
        
        
          Technology scaling accompanied with higher operating frequencies and the ability to  integrate more functionality in the same chip has been the driving force behind  delivering higher performance computing systems at lower costs. Embedded  computing systems, which have been riding the same wave of success, have evolved  into complex architectures encompassing a high number of cores interconnected by  an on-chip network (usually identified as Multiprocessor System-on-Chip). However  these trends are hindered by issues that arise as technology scaling continues  towards deep submicron scales. Firstly, growing complexity of these systems and the  variability introduced by process technologies make it ever harder to perform a  thorough optimization of the system at design time. Secondly, designers are faced with  a reliability wall that emerges as age-related degradation reduces the lifetime of  transistors, and as the probability of defects escaping post-manufacturing testing is  increased. In this thesis, we take on these challenges within the context of streaming  applications running in network-on-chip based parallel (not necessarily homogeneous)  systems-on-chip that adopt the no-remote memory access model. In particular, this  thesis tackles two main problems: (1) fault-aware online task remapping, (2)  application-level self-adaptation for quality management. For the former, by viewing  fault tolerance as a self-adaptation aspect, we adopt a cross-layer approach that aims  at graceful performance degradation by addressing permanent faults in processing  elements mostly at system-level, in particular by exploiting redundancy available in  multi-core platforms. We propose an optimal solution based on an integer linear  programming formulation (suitable for design time adoption) as well as heuristic-based  solutions to be used at run-time. We assess the impact of our approach on the lifetime  reliability. We propose two recovery schemes based on a checkpoint-and-rollback and  a rollforward technique. For the latter, we propose two variants of a monitor-controller-  adapter loop that adapts application-level parameters to meet performance goals. We  demonstrate not only that fault tolerance and self-adaptivity can be achieved in  embedded platforms, but also that it can be done without incurring large overheads. In  addressing these problems, we present techniques which have been realized  (depending on their characteristics) in the form of a design tool, a run-time library or a  hardware core to be added to the basic architecture.
        
        
       
      
      
      
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        - 
          Language
        
- 
          
        
- 
          Classification
        
- 
          
              
                
                  Computer science and technology
                
              
            
          
        
- 
          License
        
- 
          
        
- 
          Identifiers
        
- 
          
        
- 
          Persistent URL
        
- 
          https://n2t.net/ark:/12658/srd1318454