Node-level performance modeling of sparse factorization solver
      
      
        
      
      
      
      
      
      
      
      
      
      
      
      
        90 p
        
        
      
      
      
      
      
      
      
      Thèse de doctorat: Università della Svizzera italiana, 2021
      
      
      
      
      
      
      
       
      
      
      
        
        English
        
        
        
          Solving large sparse linear systems is at the heart of many application problems arising from scientific and engineering problems. These systems are often  solved by direct factorization solvers, especially when the system needs to be solved for multiple right-hand sides or when a high numerical precision is  required. Direct solvers are based on matrix factorization, which is then followed by forward and backward substitution to obtain a precise solution. The  factorization is the most computationally intensive step, but it has to be computed only once for a given matrix. Then the system is solved with forward and  backward substitution for every right-hand side. Performance modeling of algorithms involved in solving these linear systems reveals the computational  bottlenecks, which can guide node-level performance optimizations and shows the best performance that can be achieved on given architecture. In this  thesis we investigate and analyze the performance of the forward/backward solution process of the PARDISO direct sparse solver and present a detailed  performance analysis for its sparse solver kernel. This analysis is based on the Berkeley roofline model, a model that is widely used to predict the upper  bound of a code based on processor peak performance and memory bandwidth. We establish a modified roofline model that captures the serial and parallel  execution phases which allows us to predict the in-socket scaling over the processor cores. The distinction of serial and parallel execution is important as the  amount of the serial fraction depends on the matrix used and can have a significant negative impact on performance. We compared the roofline model with  an alternative Erlangen ECM model and provide discussion on usability and modeling capabilities of both models. The model predictions are compared with  various measurements for a representative set of sparse matrices on different x86_64 processors. The performance analysis and modeling performed in this  work are limited to a single node, however, the code considered here is also a building block for the MPI parallel version. Hence, also the distributed memory  implementation of PARDISO will profit from any enhancement achieved.
        
        
       
      
      
      
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        - 
          Language
        
- 
          
        
- 
          Classification
        
- 
          
              
                
                  Computer science and technology
                
              
            
          
        
- 
          License
        
- 
          
        
- 
          Identifiers
        
- 
          
        
- 
          Persistent URL
        
- 
          https://n2t.net/ark:/12658/srd1319134