| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bahaAbunojaim 168 days ago
	Haven’t done Evals yet but measured on few real world situations where projects got stuck and the brainstorm mode solved it. Definitely running evals is something worth doing and contributions are welcomed I think what really degrades the output is the context length vs context window limits, check out NoLima

1 comments

danpalmer 167 days ago

https://www.arxiv.org/abs/2512.08296

> coordination yields diminishing or negative returns once single-agent baselines exceed ~45%

This is going to be the big thing to overcome, and without actually measuring it all we're doing is AI astrology.

bahaAbunojaim 167 days ago

This is why context optimization is going to be critical and thank you so much for sharing this paper as this also validates what we are trying to do. So if we managed to keep the baseline below 40% through context optimization then coordination might actually work very well and helps at scaling agentic systems.

I agree on measuring and it is planned especially once we integrate the context optimization. I think the value of context optimization will go beyond just avoiding compacting and reducing cost to providing more reliable agents.