Hacker News new | ask | show | jobs
by MutedEstate45 321 days ago
Interesting approach, but I'm curious about the practical cost considerations. A 1,000-agent simulation could easily be hundreds of thousands of API calls. The repo recommends gpt-4o-mini over gpt-4 and supports local Llama models, but there's no guidance on the performance trade-offs.

Would love to see cost-per-experiment breakdowns and quality benchmarks across model tiers. Does a local Llama 3.1 8B produce meaningful economic simulations or do you need the reasoning power of frontier models? This could be the difference between $5 and $500 experiments.

1 comments

Using smaller, cheaper agents is one of the goals of the work. There is a Pareto frontier though: by using smaller, faster, cheaper agents, the number of steps required to converge increases. We touch upon this briefly in the paper
Thanks. That Pareto trade-off is exactly what I'm trying to quantify not just qualify. For example, if I've got a $50 budget, what's the sweet spot?

Scenario A: 100 agents × GPT-4o-mini × 500 steps Scenario B: 500 agents × local Llama 3-8B × 1,000+ steps

A quick table like "X agents × Y model × Z steps → tokens, $, convergence score" in the README would let new users budget experiments without having to read the whole paper plus run expensive experiments just to discover basic resource planning.

We ran each method in under 24 hours on a singular H100. I understand your point and think we will include this in future iterations of our work since this is very interesting from the user perspective. Though, in the paper we focus more on algorithmic concerns.
I'll look out for future iterations. Thanks and good luck with the paper.