Hacker News new | ask | show | jobs
by Merad 553 days ago
> LLMs should never do math. They shouldn't count letters or sort lists or play chess or checkers.

But these aren't "gotcha questions", these are just some of the basic interactions that people will want to have with intelligent assistants. Literally just two days ago I was doing some things with the compound interest formula - I asked Claude to solve for a particular variable of the formula, then plug in some numbers to calculate the results (it was able to do it). Could I have used Mathematica or something like that? Yes of course. But supposedly the whole purpose of a general purpose AI is that I can use it to do just about anything that I need to do. Likewise there have been multiple occasions where I've needed ChatGPT or Claude to work with tables or lists of data where I needed the results to be sorted.

2 comments

They're gotcha in the sense that people are intentionally asking LLMs to do things that LLMs are terrible at doing. LLMs are language models. They aren't math models. Or chess models. Or sorting or counting models. They aren't even logic models.

So early on the value was completely in language. But you're absolutely correct that for these tools to really be useful they need to be better than that, and slowly we're getting there. If you're asking a math question as a component of your question, firstly delegate that to an appropriate math engine while performing a series of CoT steps. And so forth.

If this stuff is getting sold as a revolution in information work, or a watershed moment in technology, or as a cultural step-change, etc, then I think the gotcha is totally fair. There seems to be no limit to the hype or sales pitch. So there need be no bounds for pedantic gotchas either.
I entirely agree with you. Trying to roll out just a raw LLM was always silly, and remains basically a false promise. Simply increasing the number of layers or parameters or transformer complexity will never resolve these core gaps.

But it's rapidly making progress. CoT models coupled with actual domain-specific logic engines (math, chemistry, physics, chess, and so on) will be when the promise is actually met by the reality.

With general mathematical questions, I've often found WolframAlpha surprisingly helpful.