Hacker News new | ask | show | jobs
by 3abiton 60 days ago
Benchmarking has been already known to be far from a signal of quality for LLMs, but it's the "best" standardized way so far. Few exists like the food truck and the svg test. At the end of the day, there is only 1 way: having your own benchmark for your own application.