Hacker News new | ask | show | jobs
by ARandumGuy 1132 days ago
This reminds of all the hype for self-driving cars a few years back. Self-driving systems performed well for 95% of driving, and it seemed like only a matter of time before the last 5% was ironed out.

Turns out, the last 5% was both extremely difficult, and extremely important. It turns out that a self driving car that randomly makes dangerous maneuvers isn't desirable. Similarly, a LLM that occasionally outputs plausible sounding bullshit quickly turns from a useful tool to something actively harmful.

1 comments

As far as I understand, LLMs with 95% correct answers are much more useful than a car that doesn't crash 95% of times (if you need to pay attention to correct mistakes, you may well be driving).

A 95% correct LLM might be utter garbage in some areas but nearly flawless (thus reliable) in other, menial and time consuming tasks, such as summarization, rewording, providing new ideas, etc.

Also a 95% correct LLM is arguably the same or better than a human doing a similar task.