|
|
|
|
|
by whack
1056 days ago
|
|
> Rather than each of K startups individually buying clusters of N gpus, together we buy a cluster with NK gpus... Then we set up a job scheduler to allocate compute In theory, this sounds almost identical to the business model behind AWS, Azure, and other cloud providers. "Instead of everyone buying a fixed amount of hardware for individual use, we'll buy a massive pool of hardware that people can time-share." Outside of cloud providers having to mark up prices to give themselves a net-margin, is there something else they are failing to do, hence creating the need for these projects? |
|
1) Margins. Public cloud investors expect a certain margin profile. They can’t compete with Lambda/Fluidstack’s margins.
2) To an extent also big clouds have worse networking for LLM training. I believe only Azure has infiniband. Oracle is 3200 Gbps but not infiniband, same for AWS I believe. GCP not sure but their A100 networking speeds were only 100 Gbps I believe rather than 1600. Whereas lambda, fluidstack and coreweave all have ib.
3) Availability. Nvidia isn’t giving big clouds the allocation they want.