| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by shihab 549 days ago

Here [1] is the leaderboard from chabot arena, where users vote on the output of two anonymous models. Deepseek R1 needs more data points- but it already climbed to No 1 with Style control ranking, which is pretty impressive.

Link [2] to the result on more standard LLM benchmarks. They conveniently placed the results on the first page of the paper.

[1] https://lmarena.ai/?leaderboard

[2] https://arxiv.org/pdf/2501.12948 (PDF)