Hacker News new | ask | show | jobs
by dinosaurdynasty 516 days ago
The chatbot leaderboard seems to be very affected by things other than capability, like "how nice is it to talk to" and "how likely is it to refuse requests" and "how fast does it respond" etc. Flash is literally one of Google's faster models, definitely not their smartest.

Not that the leaderboard isn't useful, I think "is in the top 10" says a lot more than the exact position in the top 10.

1 comments

I mean, sure, none of these models are being optimized for being the top of the leader board. They aren't even being optimized for the same things, so any comparison is going to be somewhat questionable.

But the claim I'm refuting here is "It's extremely cheap, efficient and kicks the ass of the leader of the market", and I think the leaderboard being topped by a cheap google model is pretty conclusive that that statement is not true. Is competitive with? Sure. Kicks the ass of? No.

google absolutely games for lmsys benchmarks with markdown styling. r1 is better than google flash thinking, you are putting way too much faith in lmsys