Hacker News new | ask | show | jobs
by mips_avatar 434 days ago
Yeah but it kind of kneecaps the model. They need tokens to "think". It's better to have them create a long response then distill it down later.
2 comments

You need tokens to create more revenue for the company that is running the LLM. Nothing more, nothing less
Is there a well-known benchmark for this? I don't feel that short vs long answers make any difference, but ofc feelings aren't what we can measure.

Also, if that works, why doesn't copilot/cursor write lots of excessive code mixed with lots of prose only to distill it later?

> don't feel that short vs long answers make any difference

The “thinking” models are really verbose output models that summarise the thinking at the end. These tend to outperform non-thinking models, but at a higher cost.

Anthropic lets you see some/all of the thinking so you can see how the model arrived at the answer.

So if I replace "answer" with "summarize" that should work then?