| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by electronbeam 615 days ago
	The real money is in renting infiniband clusters, not individual gpus/machines If you look at lambda one click clusters they state $4.49/H100/hr

3 comments

latchkey 615 days ago

I'm in the business of mi300x. This comment nails it.

In general, the $2 GPUs are either PE venture losing money, long contracts, huge quantities, pcie, slow (<400G) networking, or some other limitation, like unreliable uptime on some bitcoin miner that decided to pivot into the GPU space and has zero experience on how to run these more complicated systems.

Basically, all the things that if you decide to build and risk your business on these sorts of providers, you "get what you pay for".

link

jsheard 615 days ago

> slow (<400G) networking

We're not getting Folding@Home style distributed training any time soon, are we.

link

krasin 615 days ago

Distributed training data creating & curation is more useful and feasible. Training gets cheaper 1.5x every year, but data is just as expensive, if not more, given that the era of "free web crawls of human knowledge" is over.

link

marcyb5st 615 days ago

I agree with you, but as the article mentioned, if you need to finetune a small/medium model you really don't need clusters. Getting a whole server with 8/16x H100s is more than enough. And I also believe with the article when it states that most companies are finetuning some version of llama/open-weights models today.

link

pico_creator 615 days ago

Exactly, it covered in the article that there is a segmentation happening via GPU cluster size.

Is it big enough for foundation model training from scratch = ~$3+ Otherwise it drops hard

Problem is "big enough" is a moving goal post now, what was big, becomes small

link

swyx 615 days ago

so why not buy up all the little h100s and enough together for a cluster? seems like a decent rollup strategy?

ofcourse it woudl still cost a lot to do... but if the difference is $2/hr vs $4.49/hr then there's some size where it makes sense

link

ipsum2 615 days ago

Only if they're networked with Infiniband.

link

pico_creator 615 days ago

Makes sense, though only folks like runpod / sfcompute / etc, have enough visibility to maybe pull this off?

Its a risker move - then just taxing the excess compute now, and print money on the margins from bag holders

link

latchkey 615 days ago

Correct me if I'm wrong, but if I recall, neither of those two companies own their own compute. They are marketplaces.

link

pico_creator 615 days ago

Yup, but they at-least know where all these "small unused clusters" are.

Bag holders, do not want to be shouting to the world they are bag holders.

link

qeternity 615 days ago

I think sfcompute does own a lot or most of the current compute on their platform? Not entirely sure though.

link