By Martin Vechev, ETH Zurich, Switzerland, martin.vechev@inf.ethz.ch | Eran Yahav, Technion, Israel, yahave@cs.technion.ac.il
The vast amount of code available on the web is increasing on a daily basis. Open-source hosting sites such as GitHub contain billions of lines of code. Community question-answering sites provide millions of code snippets with corresponding text and metadata. The amount of code available in executable binaries is even greater. Collectively, these increasing amounts of code have been referred to as “Big Code”. In this monograph, we cover some of the recent research trends on leveraging “Big Code” for performing various programming tasks that are difficult to accomplish with traditional techniques.
The vast amount of code available on the web is increasing daily. Open-source hosting sites such as GitHub contain billions of lines of code. Community question-answering sites provide millions of code snippets with corresponding text and metadata. The amount of code available in executable binaries is even greater. Collectively, these increasing amounts of code have been referred to as “Big Code”.
Programming with “Big Code” looks at some of the recent research trends on leveraging “Big Code” for performing various programming tasks that are difficult to accomplish with traditional techniques. It discusses several recent applications, describes the dimensions of the problem, and illustrates how each application and example system fits these dimensions. It then selects one of the applications (statistical code completion) that captures various aspects of the problem of learning from “Big Code” and illustrates each of these aspects in detail.