Hacker News new | ask | show | jobs
user: lieret
created: 2025-07-24
karma: 24

submissions:

0 points | 0 comments
Show HN: New Benchmark from SWE-bench team is 0% solved
24 points | 3 comments
Show HN: All the LM solutions on SWE-bench are bloated compared to humans
1 points | 0 comments
Show HN: New eval from SWE-bench team evalutes LMs based on goals not tickets
5 points | 1 comments
0 points | 0 comments
0 points | 0 comments
0 points | 0 comments
Show HN: Randomly switching between LMs at every step boosts SWE-bench score
5 points | 1 comments
0 points | 0 comments
0 points | 0 comments
GPT-5 on SWE-bench: Cost and performance deep-dive
4 points | 3 comments
0 points | 0 comments
Show HN: New SWE-bench leaderboard compares LMs without fancy agent scaffolds
2 points | 0 comments
0 points | 0 comments
Show HN: Mini-swe-agent achieves 65% on SWE-bench in 100 lines of python
7 points | 4 comments