| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by lolinder 598 days ago

This is a regression in the model's accuracy at certain tasks when using COT, not its speed:

> In extensive experiments across all three settings, we find that a diverse collection of state-of-the-art models exhibit significant drop-offs in performance (e.g., up to 36.3% absolute accuracy for OpenAI o1-preview compared to GPT-4o) when using inference-time reasoning compared to zero-shot counterparts.

In other words, the issue they're identifying is that COT is an less effective model for some tasks compared to unmodified chat completion, not just that it slows everything down.

1 comments

mitko 598 days ago

Yeah! That's the danger with any kind of "model" whether it is CoT, CrewAI, or other ways to outsmart it. It is betting that a programmer/operator can break a large tasks up in a better way than an LLM can keep attention (assuming it can fit the info in the context window).

ChatGPT's o1 model could make a lot of those programming techniques less effective, but they may still be around as they are more manageable, and constrained.

link