Hacker News new | ask | show | jobs
by sapphireblue 3617 days ago
The corpus can be acquired from companies that host coding competitions. In this paper http://arxiv.org/pdf/1510.07211v1.pdf (I already mentioned it many times, but it is very relevant) researchers took the data from a similar source

>To accomplish this goal, we leverage a dataset from a pedagogical programming online judge (OJ) system,2 intended for the undergraduate course, Introduction to Computing. The OJ system comprises different programming problems. Students submit their source code to a specific problem, and the OJ system judges its validity automatically (via running).

And trained seq2seq model on it on character level and it almost works, barring a few typos. IMHO this is an underappreciated breakthrough. With more data and a better model generating competition-grade programs seems possible.