Hacker News new | ask | show | jobs
by DanielHall 85 days ago
These small models, having been fine-tuned for the test, achieve frighteningly high scores, yet perform abysmally in real-world scenarios.