|
|
|
|
|
by ben_w
740 days ago
|
|
Indeed, and this is also the general problem with most current ways to evaluate AI: by every test there's at least one model which looks wildly superhuman, but actually using them reveals they're book-smart at everything without having any street-smarts. The difference between expectation and reality is tripping people up in both directions — a nearly-free everything-intern is still very useful, but to treat LLMs* as experts (or capable of meaningful on-the-job learning if you're not fine-tuning the model) is a mistake. * special purpose AI like Stockfish, however, should be treated as experts |
|