|
|
|
|
|
by dakolli
46 days ago
|
|
That's not possible, read my comment above. These are private companies, there are no public filings regarding their profitability in any sense. You're just making things up. If you have a machine running at 150 tok/ps you can only make $5820 a month at $15 per 1mm running 24/7. It costs a hell of a lot more than 6k a month to run Claude 4.7 @ 150 tok/ps on that machine 24/7. This math is a bit off, because you have input tokens too, but regardless its still not profitable especially for how long it takes to turn around a request and the caching is probably not all that profitable. |
|
The reason it works: each time you read the model (memory bound) to calculate the next token, you can also update multiple requests (compute bound) while at it. It's also much more energy-efficient per token.
[1] https://aimultiple.com/gpu-benchmark