A lot of startups are convinced they'll need Google scale from day one. Then, of course, the overwhelming majority fail in the first year.
Get a big, reliable, and cheap vhost/server somewhere, use as many "can't really go all that wrong" components like postgres, minio, etc and dockerize everything. If you want to get "fancy" use ZFS and setup some snapshots and backup. Most solutions don't even need 100% uptime. Communicate maintenance windows to customers and you'll be fine with a total of an hour of downtime (or whatever) in the first year. Most big, really complex, early over-engineered and unnecessarily "optimized" solutions have enough footguns you'll probably end up with more unscheduled downtime in the first year anyway.
In the rare event the startup really succeeds and customer demand, load, uptime requirements, etc demand it you can throw revenue/funding/etc at a K8s control plane on your favorite hosting provider, use a managed postgres/db/whatever, and S3 compatible object store, etc. Or, if things get really big skip all of that and hire in house talent to manage a couple of racks of leased hardware (same opex as cloud but almost always SUBSTANTIALLY cheaper) in geo redundant/distributed colocation facilities.
I've launched multiple startups with this strategy and it's gone very well. My current startups all run from the same big (but 10yr old) hardware that has loooooong since paid for itself even with lots of GPU, storage, etc upgrades over the years. People can be kind of scared of hardware but I've never had downtime or data loss caused by a hardware failure in almost 20 years of this approach.
People are always amazed when I do things with ML, TBs of data, lots of bandwidth, etc and I tell them my total hosting costs are $150/mo.
> My current startups all run from the same big (but 10yr old) hardware
I'd love to hear more about the setup. I suspect something as simple as disk failure would cause outage, although I suppose you can detect a soon to fail disk via SMART and resolve that with scheduled maintenance/downtime. But what about power supply failure? Do you keep redundant backup parts on hand?
There's definitely something nice about having a hardware error on a cloud VM result in that VM cycling out to new hardware automatically. In contrast, something as simple as buying a new off the shelf PSU feels like a ~1 hour downtime event (longer of you don't have purchase card authority, it's night time, you need to order online, etc.).
I like to use ZFS raidz2 at a minimum. More or less bulletproof from a storage/hardware standpoint.
The system I referenced currently has x8 4TB NVMe drives on ZFS raidz2 and x8 16TB rust drives also on raidz2. I use sanoid to snapshot like crazy (down to 15 minutes) and then syncoid to push snapshots and then some to the spinners ZFS array. Plus zfs send remote to offsite.
Modern switching power supplies are incredibly reliable. Then do proper “half load” dual power supplies from dual conditioned power feeds with UPS and generator. Losing power to a machine almost implies an extinction level event or complete incompetence on the part of a data center operator.
A lot of startups are convinced they'll need Google scale from day one. Then, of course, the overwhelming majority fail in the first year.
Get a big, reliable, and cheap vhost/server somewhere, use as many "can't really go all that wrong" components like postgres, minio, etc and dockerize everything. If you want to get "fancy" use ZFS and setup some snapshots and backup. Most solutions don't even need 100% uptime. Communicate maintenance windows to customers and you'll be fine with a total of an hour of downtime (or whatever) in the first year. Most big, really complex, early over-engineered and unnecessarily "optimized" solutions have enough footguns you'll probably end up with more unscheduled downtime in the first year anyway.
In the rare event the startup really succeeds and customer demand, load, uptime requirements, etc demand it you can throw revenue/funding/etc at a K8s control plane on your favorite hosting provider, use a managed postgres/db/whatever, and S3 compatible object store, etc. Or, if things get really big skip all of that and hire in house talent to manage a couple of racks of leased hardware (same opex as cloud but almost always SUBSTANTIALLY cheaper) in geo redundant/distributed colocation facilities.
I've launched multiple startups with this strategy and it's gone very well. My current startups all run from the same big (but 10yr old) hardware that has loooooong since paid for itself even with lots of GPU, storage, etc upgrades over the years. People can be kind of scared of hardware but I've never had downtime or data loss caused by a hardware failure in almost 20 years of this approach.
People are always amazed when I do things with ML, TBs of data, lots of bandwidth, etc and I tell them my total hosting costs are $150/mo.