Hacker News new | ask | show | jobs
SlopCodeBench: Benchmarking How Coding Agents Degrade over Long-Horizon Tasks (arxiv.org)
2 points by FiberBundle 78 days ago