| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by theOGognf 417 days ago
	Some anecdotal data, but we recently estimated the cost of running a LLM at $WORK by looking at power usage over a bursty period of requests from our internal users and it was on the order of $10s/mil tokens. And we arent a big place, nor were our servers at max load, so I can see the cost being much lower at scale

1 comments

exceptione 417 days ago

This is only the power usage?

link

theOGognf 417 days ago

Right, this is only power usage. Factoring in labor and all that would make it more expensive for sure. However, it’s not like it’s a complex system to maintain. We use a popular inference server and just run it with some modest rate limits . It’s been hands-off for close to a year at this point

link

exceptione 417 days ago

Ok! What hardware do you run? I had thought that would be the most expensive part.

link

dist-epoch 417 days ago

Hardware spend also need to be amortized (over 1 year? 2 years?) Unless you cloud rent.

link

jenny91 417 days ago

5 year amortization is pretty realistic I'd say. A100s (came out 2020Q1) are still in heavy use. (I think V100s from 2017Q3 are starting to be phased out a fair bit.)

link

theOGognf 417 days ago

That is true too

link