|
6th India Software Engineering Conference
New Delhi
Feb 21-23, 2013
|
|||||||||||||||||||||||||||||||||||||||||
Mining and Summarization of Software Problem Reports
Half-day TutorialTime: Afternoon from 2:00 PM, 21st Feb. 2013
Abstract: An increasingly large amount of data is available nowadays in the form of bug reports and problem tickets. These reports contain a lot of useful information which can help not only in understanding the general state of affairs of a project, but also in discovering deeper root causes of problems. Gathering such information and generating succinct, meaningful summaries from these problem reports can help in more active and informed decision making in software development or software maintenance life-cycles. To mine useful information from these reports, it is important to understand the nature and type of data in them. These reports contain a combination of structured fields (process area, application name, module, open/closed dates, etc.) and unstructured free-text data (problem description, resolution employed, etc.) and therefore typical challenges include grouping these reports based on similarity of content across one or more of these fields. These characteristics along with the general data-driven nature of these problems have guided the use of well-known machine learning techniques. We will review some of the popular techniques employed and discuss their advantages and shortcomings. An additional challenge that has not seen as much progress is the task of summarizing the discovered groupings in ways that are not only representative of the groupings, but are also concise and easy-to-understand to a human user. We will discuss existing techniques that have attempted to address this, explaining the challenges that lie ahead. Further, there are certain typical characteristics with the nature of the data in these problem reports which have posed major challenges in the success of off-the-shelf techniques. Examples of these include high levels of noise in the data, lack of standardization of reporting, natural language processing limitations, etc. We will address these aspects and discuss some of the work in the literature that seeks to overcome them. Finally, we will discuss some of the major open problems in this area and attempt to link them to similar problems in other areas such as NLP-based knowledge extraction problems from social media, etc. Specifically, the major topics under this area of research that this tutorial aims to cover include:
|
|
||||||||||||||||||||||||||||||||||||||||