|
|
|
|
|
by paulddraper
82 days ago
|
|
Note this is also assuming you (1) Rent your GPUs. (2) Pay list price, no volume breaks. (3) Get only 85 tokens/sec. Realistically, frontier models would attain 200+ tokens/second amortized. Inference is extremely profitable at scale. |
|
You're generating about 36 million tokens/hour. Cost of Mixtral 8x7b on Open router is $0.54/M input tokens. $0.54/M output tokens.
You're looking at potentially $38.88/hour return on that H100 GPU. This is probably the best case scenario.
In reality, inference providers will use multiple GPUs together to run bigger, smarter models for a higher price.