ISEC 2013
6th India Software Engineering Conference
New Delhi
Feb 21-23, 2013

Mining and Summarization of Software Problem Reports

Half-day Tutorial

Download Tutorial

Time: Afternoon from 2:00 PM, 21st Feb. 2013

Short Bio: Karthik Sankaranarayanan is a Research Scientist with the Human Language Technologies department at IBM Research - India. His main interests are in statistical machine learning applied to different domains; with his current work being on application of classification and clustering techniques to ticket analytics. He obtained his PhD in Computer Science from The Ohio State University in 2011 working at the intersection of machine learning and computer vision, where he developed novel multiple-instance learning algorithms for problems in object localization and tracking. His work was supported by the US Dept of Energy Los Alamos National Lab and the National Science Foundation. He has published several papers in top conferences in computer vision, pattern recognition and machine learning.

Abstract: An increasingly large amount of data is available nowadays in the form of bug reports and problem tickets. These reports contain a lot of useful information which can help not only in understanding the general state of affairs of a project, but also in discovering deeper root causes of problems. Gathering such information and generating succinct, meaningful summaries from these problem reports can help in more active and informed decision making in software development or software maintenance life-cycles.

To mine useful information from these reports, it is important to understand the nature and type of data in them. These reports contain a combination of structured fields (process area, application name, module, open/closed dates, etc.) and unstructured free-text data (problem description, resolution employed, etc.) and therefore typical challenges include grouping these reports based on similarity of content across one or more of these fields. These characteristics along with the general data-driven nature of these problems have guided the use of well-known machine learning techniques. We will review some of the popular techniques employed and discuss their advantages and shortcomings. An additional challenge that has not seen as much progress is the task of summarizing the discovered groupings in ways that are not only representative of the groupings, but are also concise and easy-to-understand to a human user. We will discuss existing techniques that have attempted to address this, explaining the challenges that lie ahead.

Further, there are certain typical characteristics with the nature of the data in these problem reports which have posed major challenges in the success of off-the-shelf techniques. Examples of these include high levels of noise in the data, lack of standardization of reporting, natural language processing limitations, etc. We will address these aspects and discuss some of the work in the literature that seeks to overcome them.

Finally, we will discuss some of the major open problems in this area and attempt to link them to similar problems in other areas such as NLP-based knowledge extraction problems from social media, etc.

Specifically, the major topics under this area of research that this tutorial aims to cover include:

  • Characteristics of Sources of Software Problem Reports
  • Discussion forums, Bug reports, Problem tickets
  • Text Analytics challenges
  • Classification, Clustering, Topics extraction.
  • Summarization and Description of Reports
  • Learning approaches for discovering short, concise, meaningful labels for problem reports to enable quick user interaction and understanding
  • Ranking Summaries for Prioritization of Investigation
  • Evaluation techniques – qualitative and quantitative
  • Incorporating other sources of relevant information
  • Code change history, documentation etc.


News
 
Important Dates
Abstracts Sep 15, 2012
Full Papers Sep 23, 2012
Workshop &
Tutorials
Proposals
Oct 15, 2012
Notification Dec 1, 2012
Camera-Ready Version Jan 8, 2013
Registration Feb 10, 2013
Conference Feb 21-23, 2013
 
Contact
General Chairs:
Sugata Ghosal IBM
Gautam Shroff TCS
Program Chairs:
Satish Chandra IBM
Nachi Nagappan Microsoft
Webmaster:
Apoorv Narang IIIT-Delhi