Hacker News new | ask | show | jobs
by theOGognf 368 days ago
Some anecdotal data, but we recently estimated the cost of running a LLM at $WORK by looking at power usage over a bursty period of requests from our internal users and it was on the order of $10s/mil tokens. And we arent a big place, nor were our servers at max load, so I can see the cost being much lower at scale
1 comments

This is only the power usage?
Right, this is only power usage. Factoring in labor and all that would make it more expensive for sure. However, it’s not like it’s a complex system to maintain. We use a popular inference server and just run it with some modest rate limits . It’s been hands-off for close to a year at this point
Ok! What hardware do you run? I had thought that would be the most expensive part.
Hardware spend also need to be amortized (over 1 year? 2 years?) Unless you cloud rent.
5 year amortization is pretty realistic I'd say. A100s (came out 2020Q1) are still in heavy use. (I think V100s from 2017Q3 are starting to be phased out a fair bit.)
That is true too