|
|
|
|
|
by sapphireblue
3617 days ago
|
|
The corpus can be acquired from companies that host coding competitions. In this paper http://arxiv.org/pdf/1510.07211v1.pdf (I already mentioned it many times, but it is very relevant) researchers took the data from a similar source >To accomplish this goal, we leverage a dataset from a pedagogical
programming online judge (OJ) system,2 intended for the undergraduate course, Introduction to Computing. The OJ system comprises different programming problems. Students submit their source code to a specific problem, and the OJ system judges its validity automatically (via running). And trained seq2seq model on it on character level and it almost works, barring a few typos. IMHO this is an underappreciated breakthrough. With more data and a better model generating competition-grade programs seems possible. |
|