Download PDFOpen PDF in browser

Deep Learning On Code with an Unbounded Vocabulary

EasyChair Preprint 466

11 pagesDate: August 29, 2018

Abstract

A major challenge when using techniques from Natural Language Processing for supervised learning on computer program source code is that many words in code are neologisms. Reasoning over such an unbounded vocabulary is not something NLP methods are typically suited for. We introduce a deep model that contends with an unbounded vocabulary (at training or test time) by embedding new words as nodes in a graph as they are encountered and processing the graph with a Graph Neural Network.

Keyphrases: Graph Neural Network, Graph Neural Networks, Learning Representation, Natural Language Processing, abstract syntax tree, ast augmented ast, augmented ast, control flow, deep learning, fixed vocabulary, machine learning, neural network, source code, unbounded vocabulary, variable naming task

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
@booklet{EasyChair:466,
  author    = {Milan Cvitkovic and Badal Singh and Anima Anandkumar},
  title     = {Deep Learning On Code with an Unbounded Vocabulary},
  doi       = {10.29007/bc6w},
  howpublished = {EasyChair Preprint 466},
  year      = {EasyChair, 2018}}
Download PDFOpen PDF in browser