Code Search and Idiomatic Snippet Synthesis

Event details
Date | 08.04.2016 |
Hour | 11:00 › 12:00 |
Location | |
Category | Conferences - Seminars |
by Mukund Raghothaman
Abstract
In recent years, the program analysis and synthesis communities are realizing the value of large open-source code repositories such as GitHub and BitBucket. These repositories can greatly impact the field: from providing better real-world benchmarks for existing algorithms, to facilitating entirely new techniques for code completion and anomaly detection.
In this talk, we will consider the problem of API exploration. Modern programming frameworks come with large libraries, with diverse applications such as for matching regular expressions, parsing XML files and sending email. Programmers often use search engines such as Google and Bing to learn about existing APIs. I will describe SWIM (Synthesize What I Mean), a tool which suggests code snippets given API-related natural language queries such as ``generate md5 hash code''. The query does not need to contain framework-specific trivia such as the type names or methods of interest.
I will address three specific problems: inferring ``idioms'' from large code repositories, natural language processing to understand the input queries, and the architecture of the SWIM synthesizer which allows fast response times and easy collaboration between the NLP and programming language researchers.
We evaluated SWIM with 30 common C# API-related queries received by Bing. For 70% of the queries, the first suggested snippet was a relevant solution, and a relevant solution was present in the top 10 results for all benchmarked queries. The online portion of the workflow is also very responsive, at an average of 1.5 seconds per snippet.
This is joint work with Yi Wei and Youssef Hamadi during a summer internship at Microsoft Research Cambridge.
Bio
Mukund Raghothaman is a Ph.D. student at the University of Pennsylvania, advised by Rajeev Alur, and funded by the NSF ExCAPE grant.
His research goal is to make programming easier by building new programming abstractions and assistance tools. In his thesis, he designs DReX, a domain-specific language to describe stream transformations. He are now studying extensions to quantitative functions, approximate query evaluation, and applications to the static analysis of string manipulating programs.
More broadly, he is interested in formal verification and program synthesis. Program synthesis is the problem of converting human intentions into concrete programs. The input is often vague and exploratory: he spent two summers working with Youssef Hamadi and Yi Wei on the synthesis of idiomatic code snippets for the Bing Code Search Tool at Microsoft Research Cambridge. He was also part of the team that formalized SyGuS.
In 2010, he graduated from the Indian Institute of Technology Guwahati with an undergraduate degree in computer science.
More information
Abstract
In recent years, the program analysis and synthesis communities are realizing the value of large open-source code repositories such as GitHub and BitBucket. These repositories can greatly impact the field: from providing better real-world benchmarks for existing algorithms, to facilitating entirely new techniques for code completion and anomaly detection.
In this talk, we will consider the problem of API exploration. Modern programming frameworks come with large libraries, with diverse applications such as for matching regular expressions, parsing XML files and sending email. Programmers often use search engines such as Google and Bing to learn about existing APIs. I will describe SWIM (Synthesize What I Mean), a tool which suggests code snippets given API-related natural language queries such as ``generate md5 hash code''. The query does not need to contain framework-specific trivia such as the type names or methods of interest.
I will address three specific problems: inferring ``idioms'' from large code repositories, natural language processing to understand the input queries, and the architecture of the SWIM synthesizer which allows fast response times and easy collaboration between the NLP and programming language researchers.
We evaluated SWIM with 30 common C# API-related queries received by Bing. For 70% of the queries, the first suggested snippet was a relevant solution, and a relevant solution was present in the top 10 results for all benchmarked queries. The online portion of the workflow is also very responsive, at an average of 1.5 seconds per snippet.
This is joint work with Yi Wei and Youssef Hamadi during a summer internship at Microsoft Research Cambridge.
Bio
Mukund Raghothaman is a Ph.D. student at the University of Pennsylvania, advised by Rajeev Alur, and funded by the NSF ExCAPE grant.
His research goal is to make programming easier by building new programming abstractions and assistance tools. In his thesis, he designs DReX, a domain-specific language to describe stream transformations. He are now studying extensions to quantitative functions, approximate query evaluation, and applications to the static analysis of string manipulating programs.
More broadly, he is interested in formal verification and program synthesis. Program synthesis is the problem of converting human intentions into concrete programs. The input is often vague and exploratory: he spent two summers working with Youssef Hamadi and Yi Wei on the synthesis of idiomatic code snippets for the Bing Code Search Tool at Microsoft Research Cambridge. He was also part of the team that formalized SyGuS.
In 2010, he graduated from the Indian Institute of Technology Guwahati with an undergraduate degree in computer science.
More information
Practical information
- General public
- Free
- This event is internal
Contact
- Host: Viktor Kuncak