Hacker News new | ask | show | jobs
LLM Speedrunner: Eval for frontier models to reproduce scientific findings (github.com)
2 points by zerojames 356 days ago