Hacker News new | ask | show | jobs
by jstx1 1015 days ago
> Train your LLM at scale on our infrastructure

Is it really their infrastructure or are they using a cloud provider and this wraps it up and provides convenience for a price?

3 comments

Azure and such get such massive scaling cost benefits from scaling that HF's own GPUs would probably be more expensive anyway, even if they go AMD/Intel.

It does seem like they should run their own storage nodes, with the sheer quantity of models they host...

Everyone claims that, yet I have never seen it happen.

Typically, small companies get rebates on NVIDIA GPUs, but big established ones do not. So I would expect a startup with 100 GPUs to pay less per GPU than Azure.

I'd think "infrastructur" includes the nice front end and Python API that they have proven to be capable to pull off already.
What’s the difference?
You end up paying more in the latter instance.
Not counting the cost of learning how to cluster together 500 GPUs, the cost of learning how to train models efficiently on 500 GPUs, the cost of convincing a cloud provider to let you get 500 GPUs, the cost of trying to find a cloud provider that actually has 500 GPUs you can book, etc, etc.