|
|
|
|
|
by drtz
23 days ago
|
|
> It's a weird fact claim, because the ground truth is "nobody knows for sure" and that's not one of the available options. It's even weirder to suggest that the disagreement is indicative of a problem. If you asked five very knowledgeable humans on this subject to select the correct answer on a multiple-choice questionnaire, they would almost certainly vary significantly more than these 5 LLMs. Not to say that hallucination isn't a problem, but this is a lousy way to test it. |
|
These types of experiments prove to me that there is no real "reasoning" happening and "reasoning/thinking" tokens as a concept are mostly there to convince people to use models that consume more tokens and produce more revenue. The output from reasoning models might be more accurate, but its just a consequence of a longer inference runtime, there is no "reasoning" happening, reasoning is just sales/UX bullsh*t.