Y
Hacker News
new
|
ask
|
show
|
jobs
user:
xdotli
created:
2023-07-08
karma:
15
Founder BenchFlow.ai, a benchmark company.
submissions:
0 points
|
0 comments
Frontier Model Training Methodologies
2 points
|
1 comments
0 points
|
0 comments
ClawsBench shows GPT-5.4 tries to reward hack 80% of the time
3 points
|
1 comments
0 points
|
0 comments
Chaos of Agent
1 points
|
1 comments
0 points
|
0 comments
Native CLI scaffolds consistently outper-form OpenCode when using the same model
1 points
|
1 comments
We compare model quality in Cursor
2 points
|
0 comments
Automatically Learning Skills for Coding Agents
4 points
|
0 comments
We Reached 74.8% on terminal-bench with Terminus-KIRA
2 points
|
0 comments
0 points
|
0 comments
0 points
|
0 comments
Self-generated skills don't do much for AI agents, but human-curated skills do
2 points
|
3 comments
0 points
|
0 comments
0 points
|
0 comments
First Agent Skills Hackathon by the Authors of SkillsBench
2 points
|
1 comments
0 points
|
0 comments
0 points
|
0 comments
The First Agent Skills Benchmark
1 points
|
1 comments
0 points
|
0 comments
0 points
|
0 comments
0 points
|
0 comments
0 points
|
0 comments
0 points
|
0 comments
0 points
|
0 comments
0 points
|
0 comments
GPT-5.2 got worse on Terminal Bench 2.0, so is GPT-5.2 Pro
1 points
|
1 comments
Claude Skills as a Meta Tool
2 points
|
0 comments