|
|
|
|
|
by bahaAbunojaim
168 days ago
|
|
Haven’t done Evals yet but measured on few real world situations where projects got stuck and the brainstorm mode solved it. Definitely running evals is something worth doing and contributions are welcomed I think what really degrades the output is the context length vs context window limits, check out NoLima |
|
> coordination yields diminishing or negative returns once single-agent baselines exceed ~45%
This is going to be the big thing to overcome, and without actually measuring it all we're doing is AI astrology.