| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by seanhunter 77 days ago

I don't know what you mean by that. We know what's going on under the hood always: linear algebra, the attention mechanism etc.

To my first approximation all "Chain of thought" means is that instead of having to prompt the model to discuss everything in text and then decide at the end[1], now it sort of automatically does that so you don't need to prompt it.

[1] Which used to bring about very substantial improvements in performance on some tasks

1 comments

fc417fc802 77 days ago

I think it was clear from context that "under the hood" wasn't referring to the math but rather to the contents of the trace. What's written (often?) isn't what's actually being "thought" about. The trace is a trained output similar to the final output, which is to say that it's fake. There are research papers on the topic, particularly that models can be trained to print other arbitrary stuff during the "thinking" phase instead.

You can easily see this for yourself by carefully walking through a given trace with a critical eye. Here's an example from myself a few days ago. https://news.ycombinator.com/item?id=47623324

link

seanhunter 77 days ago

Yeah now I get what you're saying. Yes the trace isn't what's actually happening. What's actually happening is just the attention mechanism etc. The model doesn't "think" in human language, it thinks in linear algebra. The thing is that before chain of thought it used to be necessary to get the model to output some language because that's the only thing it had to attach processing to (so if you wanted more processing you needed to get it to generate more text). Whereas now we get the model to generate some text that is a simulcrum on the thought that it might hypothetically be doing but in actual practise chain of thought is just something they get the model to do by training it in a certain way.

link