Hacker News new | ask | show | jobs
Core-Bench: Computational Reproducibility Agent Benchmark (arxiv.org)
1 points by randomwalker 631 days ago