User: xdotli | HN Mirror

Y	Hacker News new \| ask \| show \| jobs

user: xdotli
created: 2023-07-08
karma: 16

Founder BenchFlow.ai, a benchmark company.

submissions:

A curated, non-BS library of the best resources for evaluating agents

3 points | 0 comments

0 points | 0 comments

Frontier Model Training Methodologies

2 points | 1 comments

0 points | 0 comments

ClawsBench shows GPT-5.4 tries to reward hack 80% of the time

3 points | 1 comments

0 points | 0 comments

1 points | 1 comments

0 points | 0 comments

Native CLI scaffolds consistently outper-form OpenCode when using the same model

1 points | 1 comments

We compare model quality in Cursor

2 points | 0 comments

Automatically Learning Skills for Coding Agents

4 points | 0 comments

We Reached 74.8% on terminal-bench with Terminus-KIRA

2 points | 0 comments

0 points | 0 comments

0 points | 0 comments

Self-generated skills don't do much for AI agents, but human-curated skills do

2 points | 3 comments

0 points | 0 comments

0 points | 0 comments

First Agent Skills Hackathon by the Authors of SkillsBench

2 points | 1 comments

0 points | 0 comments

0 points | 0 comments

The First Agent Skills Benchmark

1 points | 1 comments

0 points | 0 comments

0 points | 0 comments

0 points | 0 comments

0 points | 0 comments

0 points | 0 comments

0 points | 0 comments

0 points | 0 comments

GPT-5.2 got worse on Terminal Bench 2.0, so is GPT-5.2 Pro

1 points | 1 comments