| I know how much we paid and it is substantially less than what you were quoted - very likely from one of the 12 providers you contacted. It is likely you just didn't realize how much margin these providers have and did not negotiate enough. How else do you think cloud providers are able to afford the rates they are giving? The way you describe it, places like Coreweave are operating as a charity. That isn't true - they just got better prices than you. Our inference setup is 7 figures, has been running for a while (with new servers purchased frequently along the way) and there have been no issues - the cards, CPU, RAM, are all top of the line server hardware. 1. For inference (which is 80%+ of our need) our utilization is 100% 24/7/365. For stuff that is variable (like training) we often do use cloud - as I mentioned we do both. 2. I am the CEO so I am not sure who I'm asking for budget? 3. At this point we would have paid more for cloud than what we spent purchasing our own hardware. There is nothing stopping us from getting new hardware or cloud with newer cards while still getting to own our current hardware. In fact since our costs over the last year were lower due to us buying our own hardware it is actually easier for us to afford newer cards. |
I was mainly talking about training workloads, inference is a different beast. I'm actually surprised you have 100% inference utilization - customer load typically scales dynamically, so with on-prem servers you would need to over-provision.
CEOs don't usually order hardware, they have IT people for that, with input from people like me (ML engineers) who could estimate the workloads, future needs, and specific hw requirements (e.g. GPU memory). And when your people come to you asking for budget, while you're trying to raise the next round, you're more likely to approve the 'no high upfront cost' option, right?
In my situation, when asked about buy vs rent my initial reaction was "definitely buy", but when I actually looked at the numbers, the 3 years break even period, no upfront costs for cloud, and no need to provision storage and networking, made it an easy recommendation. The cost of cloud GPUs has come down dramatically in the last couple of years.
Though I would like to have at least a couple of local GPU servers for quick experimentation/prototyping, because sometimes the overhead of spinning up a new instance and copying datasets is too great relative to the task.