I'd suggest finding a cheaper vendor if that is the lowest price you can get for an 8xA100 server. We spend a lot on both and colo our servers so I've definitely done the math!
Six months ago I've contacted 12 different vendors, the quotes for four 8xA100 servers ranged from 130k to 200k each. You probably wouldn't want to buy from the low end vendors.
Keep in mind, there are three important advantages of cloud:
1. You only pay for what you use (hourly). What is utilization of your on-prem servers?
2. You don't have to pay upfront - easier to ask for budget
3. You can upgrade your hardware easily as soon as new GPU models become available.
I know how much we paid and it is substantially less than what you were quoted - very likely from one of the 12 providers you contacted.
It is likely you just didn't realize how much margin these providers have and did not negotiate enough. How else do you think cloud providers are able to afford the rates they are giving? The way you describe it, places like Coreweave are operating as a charity. That isn't true - they just got better prices than you.
Our inference setup is 7 figures, has been running for a while (with new servers purchased frequently along the way) and there have been no issues - the cards, CPU, RAM, are all top of the line server hardware.
1. For inference (which is 80%+ of our need) our utilization is 100% 24/7/365. For stuff that is variable (like training) we often do use cloud - as I mentioned we do both.
2. I am the CEO so I am not sure who I'm asking for budget?
3. At this point we would have paid more for cloud than what we spent purchasing our own hardware. There is nothing stopping us from getting new hardware or cloud with newer cards while still getting to own our current hardware. In fact since our costs over the last year were lower due to us buying our own hardware it is actually easier for us to afford newer cards.
Yes, obviously cloud providers get their hardware at a fraction of a cost I'm quoted, they are ordering thousands of servers. I was only buying four. No one would negotiate with me, I tried. I suppose if I had a 7 digit budget I could get a better deal.
I was mainly talking about training workloads, inference is a different beast. I'm actually surprised you have 100% inference utilization - customer load typically scales dynamically, so with on-prem servers you would need to over-provision.
CEOs don't usually order hardware, they have IT people for that, with input from people like me (ML engineers) who could estimate the workloads, future needs, and specific hw requirements (e.g. GPU memory). And when your people come to you asking for budget, while you're trying to raise the next round, you're more likely to approve the 'no high upfront cost' option, right?
In my situation, when asked about buy vs rent my initial reaction was "definitely buy", but when I actually looked at the numbers, the 3 years break even period, no upfront costs for cloud, and no need to provision storage and networking, made it an easy recommendation. The cost of cloud GPUs has come down dramatically in the last couple of years.
Though I would like to have at least a couple of local GPU servers for quick experimentation/prototyping, because sometimes the overhead of spinning up a new instance and copying datasets is too great relative to the task.
> I suppose if I had a 7 digit budget I could get a better deal.
We got our "deal" when buying just a single server and have since bought many more with the same provider. We didn't spend 7 figures all at once, we did it piece-meal over time. There is nothing stopping you from getting much better prices.
> I'm actually surprised you have 100% inference utilization - customer load typically scales dynamically, so with on-prem servers you would need to over-provision.
It is pretty easy to achieve 100% inference utilization if you can find inference work that does not need to be done on-demand. We have a priority queue and the lower priority work gets done during periods with lower demand.
> CEOs don't usually order hardware, they have IT people for that, with input from people like me (ML engineers) who could estimate the workloads, future needs, and specific hw requirements (e.g. GPU memory).
Judging by this conversation it seems like "people like you" may not be the best people to answer this question since the best hardware quote you could get was at a >100% markup! At a startup that specializes in ML research and work the CEO is going to be intimately familiar with ML workloads, needs, and hardware requirements.
> And when your people come to you asking for budget, while you're trying to raise the next round, you're more likely to approve the 'no high upfront cost' option, right?
If the break even point is 6-7 months and our runway is longer than 6-7 months why would this matter?