Doctoral thesis

Analyzing system performance with probabilistic performance annotations


142 p

Thèse de doctorat: Università della Svizzera italiana, 2020

English Understanding the performance of software is complicated. For several performance metrics, in addition to the algorithmic complexity, one must also consider the dynamics of running a program within different combinations of hardware and software environments. Such dynamical aspects are not visible from the code alone, and any kind of static analysis falls short. For example, in reality, the running time of a sort method for a list is going to be different from the expected O(nlogn) complexity if the hardware does not have enough memory to hold the entire list. Moreover, understanding software performance has become much more complex because software systems themselves continue to grow in size and complexity, and because modularity works quite well for functionality but less so for performance. In fact, the many subsystems and libraries that compose a modern software system usually guarantee well documented functional properties but rarely guarantee or even document any performance behavior. Furthermore, while functional behaviors and problems can be reasonably isolated, performance problems are often interaction problems and they are pervasive. Performance analysts typically rely on profilers to understand the behavior of software. However, traditional profilers like gprof produce aggregate information in which the essential details of input or context-specific behaviors simply get lost. Some previous attempts at creating more informative performance profiles require that the analyst provide the performance models for software components. Other performance modeling tools deduce those models automatically but consider only the abstract algorithmic complexity, and therefore fail to find or even express interesting runtime performance metrics. In this thesis, we develop the concept of probabilistic performance annotations to understand, debug, and predict the performance of software. We also introduce Freud, a tool that creates performance annotations automatically for real world C/C++ software. A performance annotation is a textual description of the expected performance of a software component. Performance is described as a function of features of the input processed by the software, as well as features of the systems on which the software is running. Performance annotations are easy to read and understand for the developer or performance analyst, thanks to the use of concrete performance metrics such as running time, measured in seconds, and concrete features, such as the real variables as defined in the source code of the program (e.g., the variable that stores the length of a list). Freud produces performance annotations automatically using dynamic analysis. In particular, Freud instruments a binary program written in C/C++ and collects information about performance metrics and features from the running program. Such information is then processed to derive probabilistic performance annotations. Freud computes regressions and clusters to create regression trees and mixture models that describe complex, multi-modal performance behaviors. We illustrate our approach to performance analysis and the use of Freud on three complex systems-—the ownCloud distributed storage service; the MySQL database system; and the x264 video encoder library and application-—producing non-trivial characterizations of their performance.
  • English
Computer science and technology
License undefined
Persistent URL

Document views: 126 File downloads:
  • 2020INFO024.pdf: 136