Tutorial - ISEC 2013

6th India Software Engineering Conference
New Delhi

Feb 21-23, 2013

Mining and Summarization of Software Problem Reports

Half-day Tutorial

Time: Afternoon from 2:00 PM, 21st Feb. 2013

Short Bio: Karthik Sankaranarayanan is a Research Scientist with the Human Language Technologies department at IBM Research - India. His main interests are in statistical machine learning applied to different domains; with his current work being on application of classification and clustering techniques to ticket analytics. He obtained his PhD in Computer Science from The Ohio State University in 2011 working at the intersection of machine learning and computer vision, where he developed novel multiple-instance learning algorithms for problems in object localization and tracking. His work was supported by the US Dept of Energy Los Alamos National Lab and the National Science Foundation. He has published several papers in top conferences in computer vision, pattern recognition and machine learning.

Abstract: An increasingly large amount of data is available nowadays in the form of bug reports and problem tickets. These reports contain a lot of useful information which can help not only in understanding the general state of affairs of a project, but also in discovering deeper root causes of problems. Gathering such information and generating succinct, meaningful summaries from these problem reports can help in more active and informed decision making in software development or software maintenance life-cycles.

To mine useful information from these reports, it is important to understand the nature and type of data in them. These reports contain a combination of structured fields (process area, application name, module, open/closed dates, etc.) and unstructured free-text data (problem description, resolution employed, etc.) and therefore typical challenges include grouping these reports based on similarity of content across one or more of these fields. These characteristics along with the general data-driven nature of these problems have guided the use of well-known machine learning techniques. We will review some of the popular techniques employed and discuss their advantages and shortcomings. An additional challenge that has not seen as much progress is the task of summarizing the discovered groupings in ways that are not only representative of the groupings, but are also concise and easy-to-understand to a human user. We will discuss existing techniques that have attempted to address this, explaining the challenges that lie ahead.

Further, there are certain typical characteristics with the nature of the data in these problem reports which have posed major challenges in the success of off-the-shelf techniques. Examples of these include high levels of noise in the data, lack of standardization of reporting, natural language processing limitations, etc. We will address these aspects and discuss some of the work in the literature that seeks to overcome them.

Finally, we will discuss some of the major open problems in this area and attempt to link them to similar problems in other areas such as NLP-based knowledge extraction problems from social media, etc.

Specifically, the major topics under this area of research that this tutorial aims to cover include:

Characteristics of Sources of Software Problem Reports
Discussion forums, Bug reports, Problem tickets
Text Analytics challenges
Classification, Clustering, Topics extraction.
Summarization and Description of Reports
Learning approaches for discovering short, concise, meaningful labels for problem reports to enable quick user interaction and understanding
Ranking Summaries for Prioritization of Investigation
Evaluation techniques – qualitative and quantitative
Incorporating other sources of relevant information
Code change history, documentation etc.

News

ISEC 2014 will be held at Chennai.
Photo Gallery added.
Tutorial content uploaded.
Registration is now closed.
Program announced.

Important Dates

~~Abstracts~~	~~Sep 15, 2012~~
~~Full Papers~~	~~Sep 23, 2012~~
Workshop & Tutorials Proposals	~~Oct 15, 2012~~
~~Notification~~	~~Dec 1, 2012~~
~~Camera-Ready Version~~	~~Jan 8, 2013~~
~~Registration~~	~~Feb 10, 2013~~
~~Conference~~	~~Feb 21-23, 2013~~

Contact

General Chairs:
Sugata Ghosal	IBM
Gautam Shroff	TCS
Program Chairs:
Satish Chandra	IBM
Nachi Nagappan	Microsoft
Webmaster:
Apoorv Narang	IIIT-Delhi