Hacker News new | ask | show | jobs
by neversupervised 14 days ago
This is not how people use LLMs. If you ask one of these questions you’d get a longer answer, often grounded on the internet. I speculate that conditional on a smart human operator interpreting the results, such interpretations across vendors converge more often than this report makes it seem.
1 comments

Even then, there can often be substantive disagreements based on context. Hence the need for even a mostly true or mostly false bucket.