| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by loumaciel 108 days ago
	Happy to answer questions about the sandboxing, artifact format, or the benchmark setup. The benchmark harness and datasets are in the repo if anyone wants to reproduce or extend the tests. Curious if others have run into the same context compaction issues with tool-heavy agents.