| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Oxlamarr 54 days ago
	The harness point is probably the most important part here. With terminal agents, it often feels like the benchmark is measuring the whole loop: model, prompt, tool interface, retry policy, timeout handling, and recovery from bad shell commands. Do you have a sense of which part contributed most to the jump?