Hacker News new | ask | show | jobs
by cs702 1160 days ago
In hindsight, it's the most natural, most obvious next step to get LLMs to write better code:

Explain to them how to debug and fix the code they've written.

Which is pretty much what you would do with an inexperienced human software developer.

Looking at this with fresh eyes, it's both shocking to me that this sort of thing is even possible, and yet also completely unsurprising as yet another emergent capability of LLMs.

We live in interesting times.

2 comments

Are they actually running the code, and evaluating the output? Or is it debug-by-code-review?

Beware of bugs in the above code; I have only proved it correct, not tried it. - Knuth

They're doing both. Quoting from Figure 1, "the model first generates new code, then the code is executed and the model explains the code. The code explanation along with the execution results constitute the feedback message, which is then sent back to the model to perform more debugging steps. When unit tests are not available, the feedback can be purely based on code explanation."
So only evaluating output with unit tests - a fitness function. AITDD.
Not too shocking for me after this paper. https://arxiv.org/abs/2211.09066

You can teach GPT-3 arithmetic - https://imgur.com/a/w3DAYOi

Basically 100% accuracy up to about 13 digit addition and >90 after that.

What else can you teach GPT without changing weights ?

> and >90 after that

This is such a circular thing, that I feel like it is amazing to see it.

The reason LLMs use a NN is because they're trying to encode a probability function for generating the passage.

And now, you are encoding another n-gram follower exercise (i.e 1+1 = 2) on top of it :)

Yeah... and I'm kind of suspicious of the whole "without changing the weights" deal, because adding working context to the model, like telling it the algorithm for adding numbers really sounds like there's some model state that's getting mutated, even if it's not stored in a file called weights.dat or whatev.
I meant shocking in the sense that it makes me gape in awe, but as I wrote, it's also, simultaneously, completely unsurprising given all the new emergent capabilities we keep discovering. We're in agreement :-)
Oh. yes well that's fair
> 100% accuracy up to about 13 digit addition

The graphs you just posted do not support that, they'd support at most 100% accuracy up to 4 digits.

it's GPT so 13=4
It's 100 at 13 and extremely close to it prior to that. Maybe basically 100 is better.