|
|
|
|
|
by cameronh90
558 days ago
|
|
At scale, you are doing the same thing with humans too. LLMs seem to have an error rate similar to humans for the majority of simple, boring tasks, if not even a bit better since they don't get distracted and start copying and pasting their previous answers. The difference with LLMs is they simply cannot (currently) do the most complex tasks that some humans can, and when they do produce erroneous output, the errors aren't very human-like. We can all understand a cut and paste error so don't hold it against the operator, but making up sources feels like a lie and breeds distrust. |
|
This is the big one missed by the frequent comments on here wondering whether LLMs are a fad, or claiming in their current state they cannot be used to replace humans in non-trivial real-world business workflows. In fact, even 1.5 years ago at the time of GPT 3.5, the technology was already good enough.
The yardstick is the peformance of humans in the real world on a specific task. Humans, often tired, having a cold, distracted, going through a divorce. Humans who even when in a great condition make plenty of mistakes.
I guess a lot of developers struggle with understanding this because so far when software has replaced humans, it was software that on the face of it (though often not in practice) did not make mistakes if bug-free. But that has been never been necessary for software to replace humans - hence buggy software still succeeding in doing so. Of course, often software even replaces humans when it's worse at a task for cost reasons.
They're at the very least competitive, if not better than, doctors at diagnosing illnesses [1].
[1] https://www.nytimes.com/2024/11/17/health/chatgpt-ai-doctors...