Hacker News new | ask | show | jobs
by gwern 2227 days ago
I think you are underselling the potential of a model which deeply understand programming. Imagine combining such a model with something like AutoML-Zero: https://arxiv.org/abs/2003.03384 It may not be 'creative', but used as tab-completion, it's not being rewarded or incentivized or used in any way which would expose its abilities towards creating a new sort algorithm.
3 comments

I agree on the tab-completion part. Something like Gmail's smart-compose could have potentially huge benefits here.

But I'm not sure about the "deeply understand programming" part. Language modelling and "AI", in its current form, uncovers only statistical correlations and barely scratches the surface of what "understanding" is. This has restricted deployment of majority of academic research into the real-world and this, I believe, is no different and will work only in constrained settings.

Edit: typo

It would be nice to have an AI that could write unit tests, or look over your code and understand and explain where you might have bugs.
>> It would be nice to have an AI that could [write unit tests, or] look over your code and understand and explain where you might have bugs.

What you're describing (outside of the square braces) is algorithmic debugging:

https://en.wikipedia.org/wiki/Algorithmic_program_debugging

It was introduced in the PhD thesis of Ehud Shapiro. There's been a steady trickle of research work since then but it's never formed into a strong current, if I may. One reason for that is of course that Shapiro's thesis was published in 1983. So it's one of the research directions that was cut short by the last AI winter. Lessons to be learned.

Shapiro's thesis is one of two doctoral theses that became the precursors to Inductive Logic Programming, a field at the intersection of logic programming and machine learning. ILP algorithms learn programs from examples and "background knowledge" (i.e. a library of existing programs used as building blocks for new, learned programs).

The way that algorithmic debugging works is that it finds differences between the intended "model" (in logical terms: the consequences) of a program and its actual model. An algorithm that can do that can also walk back up the AST of a program the other way and produce a correct program from examples of its intended inputs and outputs.

That's the kind of stuff I study. Hence my comment above about lack of innovation etc. It's possible to automatically create novel programs with complex structures (recursion and invented sub-programs) and even discover new algorithms in the process and so on -and we know ways to do that right now. But the way to do it is not with a language model trained to predict the next character in a sequence.

As to writing unit tests, the way that most ILP algorithms work is that you give them a set of examples of the inputs and outputs of the program you want to write (e.g. "droplast([alice,and,bob,sitting,on,the,tree], [alice,and,bob,sitting,on,the])") and they write the program for you. I like to think of it as a kind of automatic TDD.

> look over your code and understand and explain where you might have bugs.

This would certainly be interesting. I'm not aware of active research going on in this area (any pointers would be helpful!).

This would require an agent to have thorough understanding of the logic you're trying to implement, and locate the piece of code where it silently fails. For this you'd again need a training dataset where the input is a piece of code and the supervision signal (the output) is location of the bug. I could imagine some sort of self-supervision to tackle this initially where you'd intentionally introduce bugs in your code to generate training data. But not sure how far this can go!

1. Generate test cases from function/class/method definitions.

2. Generate test cases from fuzz results.

3. Run tests and walk outward from symbols around relevant stacktrace frames (line numbers,).

4. Mutate and run the test again.

...

Model-based Testing (MBT) https://en.wikipedia.org/wiki/Model-based_testing

> Models can also be constructed from completed systems

> I'm not aware of active research going on in this area (any pointers would be helpful!).

Look at the static analysis tool in clang. Xcode uses it well.

> Language modelling and "AI", in its current form, uncovers only statistical correlations and barely scratches the surface of what "understanding" is

This is recurrent and somewhat unfair. Current architectures have long known to be universal, capable of reproducing any computational structure (of finite depth for NNs, and Turing complete for RNNs); they have significant structural flexibility and in principle their learning can converge to "ideal processing structures" (which supposedly our brains also approach) given good enough training conditions (data, regimen, etc.). The network scales, timescales and dataset scale to achieve what comparable human function are debatable and unknown, but I believe it's very safe to judge them on function (this particular example is indeed quite impressive), because given their performance it's likely a powerful structure has emerged under the hood -- you can think of it emerging similarly to intelligence emerges from evolution (and of course human learning). Internal recurrent evaluations of logic and representations of language can all emerge.

I wouldn't describe this process as simply statistical inference, since it has complex computational priors and structure involved. It's really algorithmic learning.

Of course, you can bake in structure to accelerate this process, and we've been discovering very useful structures (such as CNNs, LSTMs, Transformer arch) which bias the models in the desired direction but still have internal flexibility.

Bert is a language model. It's trained to predict the next character in a sequence. It does not have any capacity to "understand" programming, or anything at all. It can also not produce outputs that are not similar to the examples it's been trained on. Like all neural net models it can interpolate between its examples, but it can't extrapolate to regions of the sample space it's never seen. This is why I say it lacks the ability to innovate.

I'm not sure how you would combine AutoML-Zero with Bert. How do you mean?

What do you think is a more productive path leading to "AutoCode" ?!

A. Add external definitions or reward formalism to make the code-space easier to search?

OR

B. Keep adding code trees, execution traces, comments, memory dumps and learn from those?

My own instinct is that AlphaZero was a lot more convincing than AlphaStar, so lots of (A) is definitely needed