|
|
|
|
|
by metacritic12
1225 days ago
|
|
To your point. I find the 2+2=5 cases more interesting, and would like to see more of those: when does it happen? When is ChatGPT most useful? Most deceptive? The 80085 case is only interesting insofar as it reveals weaknesses in the tool, but it's so far from tool-use that it doesn't seem very relevant. |
|
Agreed on the meta-point that deliberate tool mis-use, while amusing and sometimes concerning, isn't determinative of the fate of the technology.
But the failure rate without tool mis-use seems quite high anecdotally, which also comports with our understanding of LLMs: hallucinations are quite common once you stray even slightly outside of things that are heavily present in the training data. Height of the Eiffel Tower? High accuracy in recall. Is this arbitrary restaurant in Barcelona any good? Very low accuracy.
The question is how much of the useful search traffic is like the latter vs. the former. My suspicion is "a lot".