| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by thorum 377 days ago
	Good article. Agree that general unreliability will continue to be an issue since it's fundamental to how LLMs work. However, it would surprise me if there was still a significant gap between single-turn and multi-turn performance in 18 months. Judging by improvements in the last few frontier model releases, I think the top AI labs have finally figured out how to train for multi-turn and agentic capabilities (likely RL) and just need to scale this up.

2 comments

karn97 377 days ago

Reasoning is just the worst kind of stop gap measure. The state that should emerge internally is forced through automating prompts. And you can clearly see this because the models rarely follow their own "reasoning". Its just auto self prompting

link

koakuma-chan 377 days ago

They’re reliable enough for many use cases

link

bluefirebrand 377 days ago

What this should be doing is exposing how those use cases are faulty, if they can accept such inconsistent and poorly defined outputs

link