Hacker News new | ask | show | jobs
DeepSWE results are unreliable – 3/3 DSv4 "failed" tasks solved with same model (github.com)
3 points by theanonymousone 7 days ago