Hacker News new | ask | show | jobs
by Bnjoroge 9 days ago
I personally dont put any weight to DeepSWE. Other than 5.5 being directionally the best model, it gets the others pretty wrong in my experience. FrontierCode from cognition looks interesting