|
|
|
|
|
by openclawai
133 days ago
|
|
For context on what cloud API costs look like when running coding agents: With Claude Sonnet at $3/$15 per 1M tokens, a typical agent loop with ~2K input tokens and ~500 output per call, 5 LLM calls per task, and 20% retry overhead (common with tool use): you're looking at roughly $0.05-0.10 per agent task. At 1K tasks/day that's ~$1.5K-3K/month in API spend. The retry overhead is where the real costs hide. Most cost comparisons assume perfect execution, but tool-calling agents fail parsing, need validation retries, etc. I've seen retry rates push effective costs 40-60% above baseline projections. Local models trading 50x slower inference for $0 marginal cost start looking very attractive for high-volume, latency-tolerant workloads. |
|
At 20t/s over 1 month, that's... $19something running literally 24/7. In reality it'd be cheaper than that.
I bet you'd burn more than $20 in electricity with a beefy machine that can run Deepseek.
The economics of batch>1 inference does not go in favor of consumers.