Y
Hacker News
new
|
ask
|
show
|
jobs
Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad
(
arxiv.org
)
6 points
by
mauriziocalo
448 days ago
1 comments
galaxyLogic
448 days ago
> Our results reveal that all tested models struggled significantly, achieving less than 5% on average
link