|
> At scale, you are doing the same thing with humans too. LLMs seem to have an error rate similar to humans for the majority of simple, boring tasks, if not even a bit better since they don't get distracted and start copying and pasting their previous answers. This is the big one missed by the frequent comments on here wondering whether LLMs are a fad, or claiming in their current state they cannot be used to replace humans in non-trivial real-world business workflows. In fact, even 1.5 years ago at the time of GPT 3.5, the technology was already good enough. The yardstick is the peformance of humans in the real world on a specific task. Humans, often tired, having a cold, distracted, going through a divorce. Humans who even when in a great condition make plenty of mistakes. I guess a lot of developers struggle with understanding this because so far when software has replaced humans, it was software that on the face of it (though often not in practice) did not make mistakes if bug-free. But that has been never been necessary for software to replace humans - hence buggy software still succeeding in doing so. Of course, often software even replaces humans when it's worse at a task for cost reasons. They're at the very least competitive, if not better than, doctors at diagnosing illnesses [1]. [1] https://www.nytimes.com/2024/11/17/health/chatgpt-ai-doctors... |
If we were both the same gender, I probably would have had my skull opened up for no reason, and she would have been discharged and later died.