Hacker News new | ask | show | jobs
by thorum 377 days ago
Good article. Agree that general unreliability will continue to be an issue since it's fundamental to how LLMs work. However, it would surprise me if there was still a significant gap between single-turn and multi-turn performance in 18 months. Judging by improvements in the last few frontier model releases, I think the top AI labs have finally figured out how to train for multi-turn and agentic capabilities (likely RL) and just need to scale this up.
2 comments

Reasoning is just the worst kind of stop gap measure. The state that should emerge internally is forced through automating prompts. And you can clearly see this because the models rarely follow their own "reasoning". Its just auto self prompting
They’re reliable enough for many use cases
What this should be doing is exposing how those use cases are faulty, if they can accept such inconsistent and poorly defined outputs