Y
Hacker News
new
|
ask
|
show
|
jobs
by
ekojs
430 days ago
I think it's most illustrative to see the sample battles (H2H) that LMArena released [1]. The outputs of Meta's model is too verbose and too 'yappy' IMO. And looking at the verdicts, it's no wonder by people are discounting LMArena rankings.
[1]:
https://huggingface.co/spaces/lmarena-ai/Llama-4-Maverick-03...
2 comments
smeeth
430 days ago
In fairness, 4o was like this until very recently. I suspect it comes from training on COT data from larger models.
link
ed
430 days ago
Yep, it’s clear that many wins are due to Llama 4’s lowered refusal rate which is an effective form of elo hacking.
link