| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by verdverm 154 days ago
	show me good evals that it actually makes a difference that is the opposite of what I see

1 comments

wolfejam 154 days ago

ETH Zurich tested this: LLM-generated prose context = -3% performance, +20% cost. Even human-written = +4% at +19% cost. The problem is prose bloat. Structured formats avoid that by design. https://arxiv.org/abs/2602.11988

link

verdverm 154 days ago

There is research that shows the opposite. A literature survey will show you something different than a single paper.

link

wolfejam 154 days ago

ETH Zurich studied 5,694 PRs across 12 diverse repos.

link