| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by chank 848 days ago
	Answer is still no and still for the above reason. Compute resources are only relevant to how fast it can answer not the quality.

1 comments

pixl97 848 days ago

Then why does chain of thought work better than asking for short answers?

link

p1esk 848 days ago

Because it’s a better prompt. Works better for people too.

link

famouswaffles 848 days ago

That's not the only reason.

More tokens = more useful compute towards making a prediction. A query with more tokens before the question is literally giving the LLM more "thinking time"

link

razodactyl 848 days ago

It correlates but the intuition is a bit misleading. What's actually happening is that by asking a model to generate more tokens, it increases the amount of information it has learnt to be present in its context block.

It's why "RAG" techniques work, the models learn during training to make use of information in context.

At the core of self-attention is dot product measurement which causes the model to act like a search engine.

It's helpful to think about it in terms of search: the shape of the outputs look like conversation but were actually prompting the model to surface information from the QKV matrices internally.

Does it feel familiar? When we brainstorm we usually chart graphs of related concepts e.g. blueberry -> pie -> apple.

link

famouswaffles 847 days ago

>What's actually happening is that by asking a model to generate more tokens, it increases the amount of information it has learnt to be present in its context block.

I'm not saying this isn't part of it but even if it's just dummy tokens without any new information, it works.

https://arxiv.org/abs/2310.02226

link

p1esk 848 days ago

It’s not clear that more tokens are better.

link

famouswaffles 848 days ago

I think it's pretty clear

https://arxiv.org/abs/2310.02226

I mean, i can imagine you wouldn't always need the extra compute.

link

p1esk 848 days ago

This paper is a great illustration of how little is understood about this question. They discovered that appending dummy tokens (ignored during both training and inference) improves performance somehow. Don’t confuse their guess as to why this might be happening with actual understanding. But in any case, this phenomenon has little to do with increasing the size of the prompt using meaningful tokens. We still have no clue if it helps or not.

link