| https://docs.mistral.ai/platform/pricing Pricing has been released too. Per 1 million output tokens: Mistral-medium $8 Mistral-small $1.94 gpt-3.5-turbo-1106 $2 gpt-4-1106-preview $30 gpt-4 $60 gpt-4-32k $120 This suggests that they’re reasonably confident that the mistral-medium model is substantially better than gpt3-5 |
I just did some napkin math, looks like inference on a 30B model with a GTX 4090 should get you about 30 tokens/sec [1], or 100k tokens/hour.
Considering such systems consume about 1 kW, that's about 10 kWh/1M tokens.
Based on the current cost of electricity, I don't think anyone could get below 2 ~ 4 $ per 1M token for a 30B model.
[1] https://old.reddit.com/r/LocalLLaMA/comments/13j5cxf/how_man...