| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by kube-system 537 days ago

> Calculators are different since the output is correct as long as the input is correct.

That isn't really true.[0] The application of calculators to a subject matter is something that does need to be considered in some use cases.

LLMs also have accuracy considerations, and although it may be to a different degree, the subject matter to which they're applicable has a broad range of acceptable accuracies. While some textual subject matter demands a very specific answer, some doesn't: For example, there may be hundreds or thousands of various ways to summarize a text that could be accurate for a particular application.

0: example: https://www.reddit.com/r/calculus/comments/upjdn4/why_do_all...

1 comments

sdesol 537 days ago

I think your point stands, but your example shows that anyone using those calculators daily should not be concerned. Those that need precision to the 6+ decimal places for complex equations should know not to fully trust consumer-grade calculators.

The issue with LLMs is that they can be so unpredictable in their behaviour. Take the following prompt that asks GPT-4 to validate the response to "calculate 2+3+5 and only display the result":

https://beta.gitsense.com/?chat=6d8af370-1ae6-4a36-961d-2902...

GPT-4o mini contradicts itself, which is not something one would expect for something we believe to be extremely simple. However, if you ask it to validate the response to "calculate 2+3+5," it will get it right.

https://beta.gitsense.com/?chat=43221de5-bff6-487a-8c0f-48ca...

By adding "and only display the result," GPT-4o mini was thrown for a loop; examples like this should give us pause.

link

kube-system 537 days ago

Well, not every tool is a hammer and not every problem is a nail.

If I ask my TI-89 to "Summarize the plot in Harry Potter and the Chamber of Secrets" it responds "ERR"! :D

LLMs are good text processors, pocket calculators are good number processors. Both have limitations, and neither are good at problem sets that are outside of their design strengths. The biggest problem with LLMs aren't that they are bad at a lot of things, it's that they look like they are good at things they aren't good at.

link

sdesol 537 days ago

I agree LLMs are good at text processing and I believe they will obsolete jobs that really should be obsoleted. Unless OpenAI, Anthropic and other AI companies come up with a breakthrough on reliability, I think it will be fair to say they will only be players and not leaders. If they can't figure something out, it will be Microsoft, Amazon and Google (distributors of diverse models) that will benefit the most.

I've personally found it is extremely unlikely for multiple good LLMs to fail at the same time, so if you want to process text and be confident in the results, I would just run the same task across 5 good models and if you have a super majority, you can be confident that it was done right.

link