| Not understanding why this is an issue for LLMs but not humans. This is a simple commercial decision to make governed by three factors. 1. What is the cost of making an error? 2. What is the cost of the human doing the work? 3. What is the likelihood of the human making an error? It's just evaluating how much more likely AI is to make an error than a human, by the cost of that error, set against the savings by using fewer humans. Look at the legal profession. Sometimes the cost of an error is high, but usually it is not. There are already tons of little errors in contracts and discovery, and today they're all human. And people are very expensive. There is a giant swath of legal work that looks very attractive to automate at less than 100% accuracy. Customer service: people offer poor customer service all the time, and usually the cost of that error is low. Human customer service isn't as expensive as legal work, but it's still relatively expensive. Very attractive to automate at less than 100% accuracy. |
Because humans have the capability to understand where their information comes from, and thus give enough meta-information to evaluate an accuracy rating, even if not all of them are good at it all the time.
I understand that there has been some effort to build this capability into LLMs, and that it works a little bit for some of them, but it is not something that most of them are fundamentally capable of.