Hacker News new | ask | show | jobs
by tikkun 1055 days ago
Couple things, mostly pricing and availability:

1) Margins. Public cloud investors expect a certain margin profile. They can’t compete with Lambda/Fluidstack’s margins.

2) To an extent also big clouds have worse networking for LLM training. I believe only Azure has infiniband. Oracle is 3200 Gbps but not infiniband, same for AWS I believe. GCP not sure but their A100 networking speeds were only 100 Gbps I believe rather than 1600. Whereas lambda, fluidstack and coreweave all have ib.

3) Availability. Nvidia isn’t giving big clouds the allocation they want.

2 comments

What is your differentiator from Lambda? That you are smaller and in a single DC?

Sincere question.

I'm not OP/submitter, but the main differentiator is that Lambda doesn't have on-demand availability for lots of interlinked H100s - you have to reserve them.

Lambda has "Lambda Sprint" which is kinda similar,[1] but Sprint is $4.85/GPU/hr instead of <$2.

So if you want 128 GPUs for a week, you can't use Lambda reserved (3 year term), you can't use Lambda on-demand (can't get 128 A/H100s on-demand), your options are Lambda Sprint or SF Compute, and SF Compute is offering significantly lower prices.

[1]: https://lambdalabs.com/service/gpu-cloud/reserved

Low margins and “will this thing still be around in 2 years” are negatively correlated.

Where’s the capital for upgrades, repairs, and replacements coming from?

Using investor's money to build something with low to zero margin until you capture enough value to make it profitable a few years down the line has been the core SV strategy for more than a decade now, so it's not an extraordinary plan.

Of course it doesn't always work, and it may be even harder to make it work in the current macroeconomic environment, but it's still pretty standard play.