Hacker News new | ask | show | jobs
by hackerlight 836 days ago
What's the cost per inference relative to H100? Isn't that the number to care about?
2 comments

Based on some rough ballpark conservative estimates (one server with 2 A100 at $50000; 50 tokens/s one one of those servers; so 10 of those servers), upfront cost with consumer hardware seems to be 1/10 to 1/20 of what the Groq hardware costs. I would guess that realistically cloud providers can probably achieve half to 1/3 of that price

So unless you need the fast latency of Groq, consumer hardware seems to be a lot cheaper for the same thoughput.

If you believe the marketing material it’s lower. Their API is the cheapest around, so either it’s true or they’re subsidizing.
Another consideration: Even if it's slightly more expensive, that can be OK if you care about inference speed. I'd pay 50% more for GPT-4 if it could deliver results that quick.