Hacker News new | ask | show | jobs
by chris_va 3617 days ago
I suspect we'll see supervised seq2seq generated code first.

Like:

---

Programmer inputs on left:

compute std dev of x please

AI on right proposes edit to code:

+ import numpy as np

...

+ stddev_x = np.std(x)

---

Not super complicated to start with, but you can see where it will go from there.

1 comments

How in the world do you train that? Where's the corpus of english <-> code mappings come from?
The corpus can be acquired from companies that host coding competitions. In this paper http://arxiv.org/pdf/1510.07211v1.pdf (I already mentioned it many times, but it is very relevant) researchers took the data from a similar source

>To accomplish this goal, we leverage a dataset from a pedagogical programming online judge (OJ) system,2 intended for the undergraduate course, Introduction to Computing. The OJ system comprises different programming problems. Students submit their source code to a specific problem, and the OJ system judges its validity automatically (via running).

And trained seq2seq model on it on character level and it almost works, barring a few typos. IMHO this is an underappreciated breakthrough. With more data and a better model generating competition-grade programs seems possible.