|
|
|
|
|
by mkolodny
441 days ago
|
|
“Got caught” is a misleading way to present what happened. According to the article, Meta publicly stated, right below the benchmark comparison, that the version of Llama on LMArena was the experimental chat version: > According to Meta’s own materials, it deployed an “experimental chat version” of Maverick to LMArena that was specifically “optimized for conversationality” The AI benchmark in question, LMArena, compares Llama 4 experimental to closed models like ChatGPT 4o latest, and Llama performs better (https://lmarena.ai/?leaderboard). |
|