IC Colloquium : Statistical Analysis of Computer Program Text: Machine Learning and Natural Language Processing Meets Software Engineering

Thumbnail

Event details

Date 09.11.2015
Hour 16:1517:30
Location
Category Conferences - Seminars
By : Charles Sutton - University of Edinburgh

Video of his talk

Abstract :
Billions of lines of source code have been written, many of which are freely available on the Internet. This code contains a wealth of implicit knowledge about how to write software that is easy to read, avoids common bugs, and uses popular libraries effectively.

We want to extract this implicit knowledge by analyzing source code text.
To do this, we employ the same tools from machine learning and natural language processing that have been applied successfully to natural language text.
After all, source code is also a means of human communication.

We present three new software engineering tools inspired by this insight:

* Naturalize, a system that learns local coding conventions.
It proposes revisions to names and to formatting so as to make code more consistent.
A version that uses word embeddings has shown promise toward naming methods and classes.

* Data mining methods have been widely applied to summarize the patterns about how programmers invoke libraries and APIs. We present a new method for mining market basket data, based on a simple generative probabilistic model, that resolves fundamental statistical pathologies that lurk in popular current data mining techniques.

* HAGGIS, a system that learns local recurring syntactic patterns, which we call idioms. HAGGIS accomplishes this using a nonparametric Bayesian tree substitution grammar, and is delicious with whisky sauce.

Bio :
Charles Sutton is a Reader (equivalent to Associate Professor: http://bit.ly/1W9UhqT) at the University of Edinburgh. He is interested in a broad range of applications of probabilistic machine learning, including NLP, analysis of computer systems, software engineering, sustainable energy, and exploratory data analysis.

Dr Sutton completed his PhD at the University of Massachusetts Amherst, working with Andrew McCallum. He did postdoctoral research at the University of California Berkeley, working with Michael I Jordan.

He is Deputy Director of the EPSRC Centre for Doctoral Training in Data Science at the University of Edinburgh.

More information

Practical information

  • General public
  • Free
  • This event is internal

Contact

  • Host : Jim Larus

Share