Doctoral thesis

Mining unstructured software data


224 p

Thèse de doctorat: Università della Svizzera italiana, 2013

English Our thesis is that the analysis of unstructured data supports software understanding and evolution analysis, and complements the data mined from structured sources. To this aim, we implemented the necessary toolset and investigated methods for exploring, exposing, and exploiting unstructured data.To validate our thesis, we focused on development email data. We found two main challenges in using it to support program comprehension and software development: The disconnection between emails and code artifacts and the noisy and mixed-language nature of email content. We tackle these challenges proposing novel approaches. First, we devise lightweight techniques for linking email data to code artifacts. We use these techniques for creating a tool to support program comprehension with email data, and to create a new set of email based metrics to improve existing defect prediction approaches. Subsequently, we devise techniques for giving a structure to the content of email and we use this structure to conduct novel software analyses to support program comprehension. In this dissertation we show that unstructured data, in the form of development emails, is a valuable addition to structured data and, if correctly mined, can be used successfully to support software engineering activities.
  • English
Computer science and technology
License undefined
Persistent URL

Document views: 110 File downloads:
  • 2013INFO003.pdf: 69