Hacker News new | ask | show | jobs
by lilyJeon 31 days ago
Honestly, the numbers are becoming increasingly difficult to interpret. Every time a new version comes out, they just call it the "best." It would be much more useful to directly compare performance on sets that people actually use, such as coding and summarizing.