| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by starik36 1284 days ago
	Ok, but how does it take my code and fixes the bug? It's my own code, no one has seen it besides me and the model wasn't trained on it.

4 comments

ravi-delia 1284 days ago

In learning to predict the next token, the model has to pick up lots of little bits of world knowledge. I'm sure someone would disagree with the phrasing of "understand", but it certainly operates with more complexity than, say, a markov chain. It has seen lots of python, and in order to predict better, it has developed internal models of how python works. Think of how much better you'd do predicting the next character of python code compared to random noise- there's a lot of structure there.

link

zorr 1284 days ago

In my (limited) experience it seems to perform even better for typed languages (for example Kotlin/Java/Swift) compared to Python. The Python code it provided often had subtle type issues when working with dates. While the Kotlin date-related code it provided was more accurate and correct in terms of types. Which makes sense since the additional type information likely leads to a much better "internal model of how Kotlin works"

What surprised me was the level of "understanding" it seems to do when providing it with some of my own sample code. It can analyze the code, explain how it works/what it does, use libraries, suggest improvements and apply those improvements.

Have a look at this conversation: https://imgur.com/a/ZtViC3d

While the end result isn't perfect, it's still highly impressive and while I was an AI-skeptic before, I now see the possible benefits of AI assistants for programming.

Some other prompts with very impressive results:

* "Write an implementation for the following Kotlin repository interface: <insert-interface-with-full-type-signatures>."

* (followup) "Add save/load methods that store the backing map in a JSON file"

* (followup) "Replace Gson with Jackson for JSON serialization"

* "Write an Android layout xml for a login form with username/password/loginbutton"

* (followup) "Provide the Kotlin activity code for this layout"

* "Write a Kotlin function that parses a semver input string into a data class"

link

FiberBundle 1284 days ago

> zorr 1 hour ago | root | parent | next [–]

I think another possibility here is that they might have used an execution environment to check whether the code the model came up with actually compiles and used that as additional input during training. Some sort of execution environment seems to me to also be a possible explanation for how they managed the model to emulate a terminal so well.

link

jameshart 1283 days ago

It’s not ‘more complexity’ than a Markov chain - it essentially is a Markov chain, just looking at a really deep sequence of preceding tokens to decide the probabilities for what comes next.

And it’s not just looking that up in a state machine, it’s ‘calculating’ it based on weights.

But in terms of ‘take sequence of input tokens; use them to decide probable next token’, it’s functionally indistinguishable from a Markov chain.

link

ravi-delia 1283 days ago

I look at deep sequences of tokens and predict what comes next- can you milk me? Once you've broadened "basically a markov chain" to "any function from a sequence of tokens to a probability distribution of tokens" there's a lot of explanatory power lost. If you had to characterize the difference between brute force mappings based on pure frequencies and model which selectively calculates probabilities based on underlying structure, wouldn't you say the latter had more complexity?

You don't have to believe the hype, but if you think you can get GPT performance out of anything remotely resembling a markov chain, I encourage you to try.

link

jameshart 1283 days ago

There's nothing about Markov chains that says the model has to be based on brute calculation from previously observed frequencies. The point is that the exact behavior of these LLMs could also be modeled as a Markov chain with a sufficiently massive state machine.

Obviously that's impractical and not how LLMs actually work - they derive the transition probabilities for a state from the input, rather than having it pre-baked - but I think from the point of view of saying 'these are more sophisticated than a Markov chain', actually strictly speaking they aren't - they are in fact a lossy compression of a Markov model.

link

krackers 1283 days ago

But it seems like the attention mechanism fundamentally isn't markov-like in that at a given position it can pool information from all other positions. So as in the simplest case when trained on masked-language modeling, the prediction of the mask in "Capital of [MASK] is Paris" can depend bidirectionally on all surrounding context. While I guess it's true that in the case where the mask is at the end (for next-token completion), you could consider this as a markov model with each state being the max attention window (2048 tokens I think?), but that's like saying all real-world computers are FSMs: it's technically true, but this isn't the best model to use for actually understanding its behavior.

Since for most inputs that are smaller than the max token length you never actually end up using the markov-ness, calling it a markov model seems like it's just in a way saying it's a function that provides a probability distribution for the next token given the previous tokens. Which just pushes the question back onto how that function is defined.

link

larsejonasson 1274 days ago

Could you not use two Markov chains for masked language modeling? One working from the beginning until [MASK] and one working backwards from the end until [MASK]. And then set [MASK] to the average of both chains. If a direct average cannot be found, it is assumed to be a multi-word-expression and words are generated from the two chains until they match.

link

jarenmf 1284 days ago

It's really awesome how good it is in modeling certain world knowledge. It seems to be struggling with putting everything in one framework. For example, it still makes a lot of mathematics and logic errors.

link

hugh-avherald 1284 days ago

How do you fix a bug? You've never seen it before.

link

spion 1284 days ago

This is why this paper was so exciting when it came out https://arxiv.org/abs/2005.14165

Make a large enough model and train it with all sorts of data and it will be able to encode generalized concepts which can then be applied to specific tasks (given only a few examples of the task, or even just a query / question, rather than an example)

link

tluyben2 1284 days ago

It often does[0], I am doing more experiments currently by writing the fixed code back in place and running tests automatically.

[0] https://brainfisheatfishbrain.com/post/chatgpt-code-reviews/

link