| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by bglazer 2227 days ago

I've read a fair number of papers on neural program synthesis lately. To me, these seemed to be obviously cherry picked examples, so you can't really evaluate the whole system based on them.

However, this is fairly impressive for a couple reasons. First, the system constructs programs from natural language descriptions, rather than examples of input-output pairs or a formal specification, which are the most common settings for program synthesis. Second, they're generating full blown python, not a smaller, domain specific language.

Finally, and this is pretty mind-blowing, is the seamless, idiomatic use of loops, branches, and function calls. I haven't seen previous program synthesis tools able to generate such complex code. They're typically limited to simple linear programs with less than about 100 lines. Complex control flow and function calls are still beyond their reach for the most part.

I'm not an active researcher in neural program synthesis, so my statements may not reflect the current state of the art.

I honestly thought that the most promising route forward for program synthesis would be a model that incorporated knowledge of the syntax and semantics of code. Most likely, a model that manipulated, or at least had some view of, the program's AST. This seems to be just throwing a giant Transformer model at github.

Fine tuning a vanilla language model on a giant corpus of code feels like a dead end for the field, long-term. It seems obvious to me that humans are doing something more than just statistical pattern recognition and generation when we write and reason about code.

Then again, it's hard to argue with results. I'm sure lots of pre-neural network voice recognition researchers were in love with the elegance of their hidden markov models.

Edit: Also, everyone should go try the FlashFill feature in Microsoft excel. As far as I know, it's the only example of program synthesis shipped in a consumer facing production system, and it works shockingly well.

3 comments

IdiocyInAction 2227 days ago

> Fine tuning a vanilla language model on a giant corpus of code feels like a dead end for the field, long-term. It seems obvious to me that humans are doing something more than just statistical pattern recognition and generation when we write and reason about code.

Yeah, this is the main reason why I would be interested in more examples. But, if this thing was trained on all of GitHub, I could imagine that it come up with decent-looking code for a lot of examples; a beefy, smarter Google with some rudimentary contextual understanding, if you will. Still, the presence of any mistakes is a no-go and I'd be really interested how it reacts to more realistic, specific requirements.

But yeah, I'd figure a model for code generation would have to have some kind of knowledge of syntax and semantics, rather than doing pure statistical pattern matching, to be of any real use. It would not only have to generate, but also to debug its code (I wonder whether you could do that purely with statistical pattern recognition). I might be wrong, of course, but I would be surprised if that is enough to write complex code.

link

MauranKilom 2227 days ago

Five years ago we were already here: https://karpathy.github.io/2015/05/21/rnn-effectiveness/

Calling the field "statistical pattern matching" might be underselling it a bit, even if technically accurate on some level. I mean, syntax/semantics are clearly not the problem, those are the easiest to learn (see the paper above). If anything, I'm scared of it writing syntactically correct nonsense (or even worse, subtly-flawed-but-correct-looking code).

link

YeGoblynQueenne 2227 days ago

>> Edit: Also, everyone should go try the FlashFill feature in Microsoft excel. As far as I know, it's the only example of program synthesis shipped in a consumer facing production system, and it works shockingly well.

And it's not a giant language model trained on a gigantic dataset. Rather, if memory serves, it's a buch of task-specific DSLs and rules, all hand-written from scratch.

link

GregarianChild 2227 days ago

I don't know how FlashFill works in 2020, but from [1] I learn that the original implementation was a brute-force enumeration (with clever heuristics along the lines of CDCL (= conflict-driven clause learning in SAT solvers) for speeding up common cases) of a small DSL for string manipulation. This was (and still is) the state-of-the-art approach to programming-by-example program synthesis.

[1] O. Polozov, S. Gulwani, FlashMeta: A Framework for Inductive Program Synthesis. https://www.microsoft.com/en-us/research/wp-content/uploads/...

link

YeGoblynQueenne 2227 days ago

That's a nice, formal way of putting it, thank you :)

(Sorry I really should have refreshed my memory on Gulwani et al. I think I've even linked the paper on an HN comment before.)

Oh, btw, I doubt they're doing this with a language model nowadays. Unless FlashFill has suddendly started filling cells for email addresses with haikus etc...

link

MauranKilom 2227 days ago

I am also hedging my hopes of this working on "more realistic" scenarios. It does produce code that looks natural to us, but i expect it to show clear "seams" where its understanding of something isn't deep enough.

But maybe this is just a question of how much compute (and network size/"depth") you invest. On a certain level we're also just some recurrent LSTM :)

link