Information Extraction for Document Digitization

(Technical Briefings)


An enterprise system executes its process based on well defined rules and regulations. All those rules and regulations are present in Natural Language which is an unstructured representation of information. This unstructured information is the basis of conducting business. All the software systems have to strictly follow these guidelines. A Subject Matter Expert (SME) will interpret the document and provide formal specifications for the implementation. To verify whether the implemented system follows the defined specification again requires manual intervention. We present a system that will digitize the unstructured information to the structured information by which the formal specification can be automatically generated with minimal manual intervention. The system minimizes the effort of an SME and has the capability to generate the configurable parameters for the implementation. The system can also perform document unification.

Target Audience

Academic as well as industry researchers, students, and industry practitioners having an exposure to information extraction, natural language processing, domain-specific languages, SBVR, and business modeling in general.

Speaker's Profile: Chandan Prakash

Chandan Prakash is a Researcher at TCS Research, Pune. His research areas include Natural Language Processing (NLP) and Deep Learning (DL). He has worked on several projects which at the very abstract level requires conversion of the unstructured information to structured information for Business Models. He has published several research papers at various conferences and presented his work to a larger audience both inside TCS Research as well as to the outside world. He is currently working on the Business Rule Mining project at TCS Research.

Speaker's Profile: Rohit Prakash Shere

Rohit Prakash Shere is a Researcher at TCS Research, Pune. His research areas of interest are Image processing, Information Retrieval and Natural Language Processing. He also has a wide experience as a Test Automation expert. Having more interest in logical and analytical puzzle solving, he devises approaches-methods for solving challenging tasks and problems. He is currently leading the work on text extraction from documents considering visual aspects of document rendering, at TCS Research.

  • 90 minutes