Y
Hacker News
new
|
ask
|
show
|
jobs
user:
lieret
created:
2025-07-24
karma:
24
submissions:
0 points
|
0 comments
Show HN: New Benchmark from SWE-bench team is 0% solved
24 points
|
3 comments
Show HN: All the LM solutions on SWE-bench are bloated compared to humans
1 points
|
0 comments
Show HN: New eval from SWE-bench team evalutes LMs based on goals not tickets
5 points
|
1 comments
0 points
|
0 comments
0 points
|
0 comments
0 points
|
0 comments
Show HN: Randomly switching between LMs at every step boosts SWE-bench score
5 points
|
1 comments
0 points
|
0 comments
0 points
|
0 comments
GPT-5 on SWE-bench: Cost and performance deep-dive
4 points
|
3 comments
0 points
|
0 comments
Show HN: New SWE-bench leaderboard compares LMs without fancy agent scaffolds
2 points
|
0 comments
0 points
|
0 comments
Show HN: Mini-swe-agent achieves 65% on SWE-bench in 100 lines of python
7 points
|
4 comments