|
|
|
|
|
by stratos123
106 days ago
|
|
A coding agent runs near-constantly, so of course it'd require a lot more compute than running even, say, a multi-minute query with a thinking model every hour. How much exactly is pretty hard to calculate because it requires some guesswork, but... For a long input of n tokens from a model with N active parameters, the cost should scale as O(N n^2) (this is due to computing attention - for non-massive n, the O(N n) term is bigger, which is why API costs per token are fixed until a certain point and then start to rise). From the estimates from [1], it's around 40Wh for n=100k, N=100B. I multiply by 2.5 to account for Opus probably being ~2.5x larger than gpt-4o, and also multiply by 2 to pessimistically assume we're always close to Opus's soft context limit of 200k (it's possible to get a bigger context for extra cost, but I suspect people compact aggresively to not have to use it). That gets me 7.2J/t, which at a rough throughput estimate of 20t/s gives me power of 144W. Like a powerful CPU or a mediocre GPU, and still orders of magnitude lower than a car. [1] https://epoch.ai/gradient-updates/how-much-energy-does-chatg... |
|