| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by 34679 296 days ago
	"By deploying this implementation locally, it translates to a cost of $0.20/1M output tokens" Is that just the cost of electricity, or does it include the cost of the GPUs spread out over their predicted lifetime?

3 comments

zipy124 296 days ago

This is all costs included. Thats 22k tokens per second per node, so per 8 h100's. With 12 nodes they get 264k tokens per second, or 950 million an hour. This get's you to roughly $0.2021 per million at $2 an hour for an h100, which is what they go for on services such as runpod.io . (cheaper if not paying spot-price + volume discounts).

dragonslayer56 296 days ago

” Our implementation, shown in the figure above, runs on 12 nodes in the Atlas Cloud, each equipped with 8 H100 GPUs.”

Maybe the cost of renting?

34679 296 days ago

I'm confused because I wouldn't consider a cloud implementation to be local.

randomjoe2 296 days ago

Local doesn't refer to "on metal" anymore to many people

mwcz 296 days ago

"On metal" is muddied too. I've heard people refer to web apps running in an OCI container as being "bare metal" deployment, as opposed to AWS or whatever hosting platform.

That's silly, but the idea that "local" is not the opposite of remote is even sillier.

dtech 296 days ago

If you do bare metal as not being under a VM it fits. OCI on linux is cgroup so that counts as not a VM I'd say. Or at least it's a layer closer to the metal than a typical VM running OCI images.

I a Java app running on Linux bare metal?

ffsm8 296 days ago

You can run an OCI container on bare metal though. It doesn't stop being run on bare metal just because you're running in kernel namespaces, aka docker container

Lots of people were advocating for running their k8s on bare metal servers to maximize the performance of their containers

Now wherever that's applied to your conversation... I've no clue, too little context ( ｡ ŏ ﹏ ŏ )

okasaki 296 days ago

In my opinion, if you're running k8s on bare metal, that's "k8s on bare metal" but still "<your app> on kubernetes", not "<your app> on bare metal".

bee_rider 296 days ago

Local doesn’t need to be “on metal,” but I’m still confused as to what they are saying. Are they running some local cloud system?

monsieurbanana 296 days ago

I missed that train

vFunct 296 days ago

My basement server really confused by all this...

demodulation 296 days ago

The one down in your Gaza tunnels?

DSingularity 296 days ago

I guess local for him is independent/private.

ollybee 296 days ago

H100's can be $2 and hour, so $192 an hour for the full cluster. They report 22k tokens per second, so ~ 80 million an hour, thats $16 an hour at $0.2 per million. Maybe a bit more for input tokens, but it seems a long way off.

zipy124 296 days ago

I think you mis-read. Thats 22k tokens per second per node, so per 8 h100's. With 12 nodes they get 264k tokens per second, or 950 million an hour. This get's you to roughly $0.2021 per million at $2 an hour.

adam_arthur 296 days ago

I'm curious as well.

Depreciation and GPU failure rate over time must be considered, which I don't see mentioned in the article.