Hacker News new | ask | show | jobs
by Eridrus 27 days ago
Nobody releases numbers that show them to be worse than competitors lol.

This even applies to OpenAI & Anthropic who don't even eval on the same datasets a lot of the time.

1 comments

I do recall mistral doing this. It's not always about being the best, but also fastest or smallest. The user should have all the information for its own use case.
If your model doesn't actually show the tradeoff you're getting for speed, you're doing marketing and not benchmarking.

Which is fine, we all have to make money, but it is disingenuous. It's just unfortunate that running some of these benchmarks is so expensive that it's not really realistic for most companies to actually run them.