Many SWE-bench passing PRs would not be merged: https://news.ycombinator.com/item?id=47341645
Top model SWE bench scores may be skewed by git history leaks: https://news.ycombinator.com/item?id=45214670