|
|
|
|
|
by kube-system
537 days ago
|
|
> Calculators are different since the output is correct as long as the input is correct. That isn't really true.[0] The application of calculators to a subject matter is something that does need to be considered in some use cases. LLMs also have accuracy considerations, and although it may be to a different degree, the subject matter to which they're applicable has a broad range of acceptable accuracies. While some textual subject matter demands a very specific answer, some doesn't: For example, there may be hundreds or thousands of various ways to summarize a text that could be accurate for a particular application. 0: example: https://www.reddit.com/r/calculus/comments/upjdn4/why_do_all... |
|
The issue with LLMs is that they can be so unpredictable in their behaviour. Take the following prompt that asks GPT-4 to validate the response to "calculate 2+3+5 and only display the result":
https://beta.gitsense.com/?chat=6d8af370-1ae6-4a36-961d-2902...
GPT-4o mini contradicts itself, which is not something one would expect for something we believe to be extremely simple. However, if you ask it to validate the response to "calculate 2+3+5," it will get it right.
https://beta.gitsense.com/?chat=43221de5-bff6-487a-8c0f-48ca...
By adding "and only display the result," GPT-4o mini was thrown for a loop; examples like this should give us pause.