Y
Hacker News
new
|
ask
|
show
|
jobs
by
verdverm
107 days ago
show me good evals that it actually makes a difference
that is the opposite of what I see
1 comments
wolfejam
107 days ago
ETH Zurich tested this: LLM-generated prose context = -3% performance, +20% cost. Even human-written = +4% at +19% cost. The problem is prose bloat. Structured formats avoid that by design.
https://arxiv.org/abs/2602.11988
link
verdverm
107 days ago
There is research that shows the opposite. A literature survey will show you something different than a single paper.
link
wolfejam
107 days ago
ETH Zurich studied 5,694 PRs across 12 diverse repos.
link