| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by overgard 338 days ago
	We need to stop calling what we have AI. LLMs can't reliably reason. Until they can the progress is far from unstoppable.

1 comments

kadushka 337 days ago

I love it how people are transitioning from “LLMs can’t reason” to “LLMs can’t reliably reason”.

link

overgard 336 days ago

Well, I was hedging a bit because I try to not overstate the case, but I'm just as happy to say: LLM's can't reason. Because it's not what they're built to do. They predict what text is likely to appear next.

But even if they can appear to reason, if it's not reliable, it doesn't matter. You wouldn't trust a tax advisor that makes things up 1/10 times, or even 1/100 times. If you're going to replace humans, "reliable" and "reproducible" are the most important things.

link

kadushka 335 days ago

Frontier models like o3 reason better than most humans. Definitely better than me. It would wipe the floor with me in a debate - on any topic, every single time.

link

charleshn 337 days ago

Frontier models went from not being able to count the number of 'r's in "strawberry" to getting gold at IMO in under 2 years [0], and people keep repeating the same clichés such as "LLMs can't reason" or "they're just next token predictors".

At this point, I think it can only be explained by ignorance, bad faith, or fear of becoming irrelevant.

[0] https://x.com/alexwei_/status/1946477742855532918

link

bwfan123 337 days ago

> At this point, I think it can only be explained by ignorance, bad faith, or fear of becoming irrelevant.

Based on the past history with frontier-math & AIME 2025 [1],[2] I would not trust announcements which cant be independently verified. I am excited to try it out though.

Also, the performance of LLMs was not even bronze [3].

Finally, this article shows that LLMs were just mostly bluffing [4].

[1] https://www.reddit.com/r/slatestarcodex/comments/1i53ih7/fro...

[2] https://x.com/DimitrisPapail/status/1888325914603516214

[3] https://matharena.ai/imo/

[4] https://arxiv.org/pdf/2503.21934

link

yeasku 337 days ago

Open AI is 10 years old and and llm just told me a dolar is 1.03 euros.

link

kadushka 337 days ago

I don’t know which llm you used - I just asked gpt-4.1 - it did a web search and provided the correct exchange rate. It took about 5 seconds.

link

yeasku 336 days ago

That is what we did after chat gpt failed, a web search.

It took us about 5 seconds.

link