Hacker News new | ask | show | jobs
FrontierSWE – Benchmark for long horizon coding tasks (github.com)
1 points by pHequals7 56 days ago
1 comments

Interesting work from Proximal - love the focus on out of distribution tasks like git to zig..