Hacker News new | ask | show | jobs
by ben_w 740 days ago
Indeed, and this is also the general problem with most current ways to evaluate AI: by every test there's at least one model which looks wildly superhuman, but actually using them reveals they're book-smart at everything without having any street-smarts.

The difference between expectation and reality is tripping people up in both directions — a nearly-free everything-intern is still very useful, but to treat LLMs* as experts (or capable of meaningful on-the-job learning if you're not fine-tuning the model) is a mistake.

* special purpose AI like Stockfish, however, should be treated as experts