Hacker News new | ask | show | jobs
by jasode 610 days ago
>While AWS and Azure are industry leaders, their advantages often only materialize at massive scales. [...]

Your comparisons are similar to many others out there that focus on measuring basic cpu and memory. This type of easy comparison where AWS/Azure/GCP is treated as a "dumb" datacenter is easy for alternatives like Hetzner or self-hosting to "win".

>Do you really need the advanced features of AWS and Azure right now? Or would a simple virtual machine at a reasonable price be sufficient? [...] There’s a growing movement among tech companies and startups to opt for more cost-effective hosting solutions like Hetzner. The high costs associated with AWS and Azure

Many (most?) YC startups are not using AWS as a low-level dumb data center with blank EC2 virtual machines and installing infrastructure software like Linux and PostgreSQL on it. Instead, they are using higher-level AWS managed services such as DynamoDB, Kinesis, SQS, etc :

Therefore, the more difficult comparison (that almost no blog post ever does) is the startup's costs for its employees to re-create/re-invent the set of higher-level AWS services that they need.

Sure, there's the "but you don't need to pay expensive AWS costs for DynamoDB when one can just install open-source Cassandra at Hetzner; and instead of AWS Kinesis, install your own Kafka, etc". Well, you add up more and more of those "just install and manage your own X,Y,Zs" and you can end up crossing the threshold where paying AWS cloud fees cost less than your staff maintaining it. The threshold for AWS isn't just massive scale of 100+ million users. The threshold can be the complexity and scope of higher-level services you need the cloud to take care of on your behalf so your small team can concentrate on the aspects of the business that are true differentiators. In other words, instead of employees installing Cassandra, they're adding features to the smartphone app.

If your company doesn't need any of the Big 3 clouds' higher-level platform services, it's easier to save money with alternatives.

12 comments

Continuing this reasoning...

As soon as your startup does get big, it starts to make more sense to try and migrate to 'dumb' machines and save on infrastructure costs, especially if your business is low margin and your infrastructure costs are high.

The flip side is that when you are small, you probably don’t need all the fancy managed services that AWS offers. Simpler solutions can save you money and time.
I'm a big proponent of appengine/heroku and similar platforms for small startups.

You can almost certainly fit all your business logic into one or two appengine apps, and fit all your data into one database. While you have just a few programmers, the fact they're all sharing a process with eachother won't matter.

The goal is working product and paying customers ASAP, not a nicely architected microservices backend 2 years from now.

Yes, it'll end up being a mess when the company has pivoted and changed directions a bunch of times, and when you finally come to get to 50M users+ scale you'll probably have to rewrite from scratch. But by then, you ought to be rewriting from scratch, because you won't know the true requirements till you get to that scale.

Unfortunately by that time you're mired in EKS, SQS, EFS and whatever other 3-letter services, unpicking which is more expensive than months of operation on AWS.
And then all of a sudden you run into more engineering costs. Companies use platform services because one dev/engineer can do a lot more on their own and focus on delivering business value rather than twiddle knobs.

And adding one dev/engineer is _massively_ more expensive, so you seldom want to scale in that axis when the option is to, say, use a managed database or even a complete data pipeline.

I agree, that's mostly because the usage patterns have become apparent and you know what features you need and what to optimise for. That's why I prefer managed services to start and then can self-host once price or needs pushes me to.
Agreed! See also the ahrefs example
The caveat to this is, you might think you need RDS, SQS, SNS, S3, Lambda, DynamoDB, Elasticache and Kinesis - but you probably only need Postgres.
SQS/SNS/S3 are so simple, reliable, and cheap they're pretty much a no brainer. While you can probably run those workloads in Postgres, it isn't designed for those use cases and you'll eventually run into nasty limitations like managing vacuums with high churn tables and slow/complicated backups with big binary blobs.

If you have a good understanding of load up front, however, those are probably non-issues.

I know, I'm mostly being tongue in cheek - the joke is so many companies go straight to complex cloud configurations more for the vibes than the actual practical need; a single box (two for availability) and a solid db will get most sites and businesses very far.
S3 is mind bogglingly expensive compared to Hetzner.
You might not technically need it but some of the things offered by those services might be ‘nice to have’ for your specific use case. If they are not available with just Posgres+etc out of the box the few hundred/thousand $ additional costs might be entirely insignificant compared to the additional work-hours you’d need to implement those things.
one day someone will rewrite postgres in rust, and i will have to switch to carpentry full time to preserve my sanity.
You might need to do that even before it ships the complete feature set.
Unfortunately, it's a false dichotomy you present, it's not a binary choice of fully managed or entirely roll your own.

E.g., if you're running K8s (one thing I typically recommend you buy a managed one of), you can install your own Kafka in it, using an operator that does about 85% of what MSK does.

Sure, you'll need to dedicate person hours to support the operator, but is supporting that any more expensive than supporting AWS products? That you're already paying through the nose for?

It's also, what kind of startup are you? What kind of workload do you have?

If you are bootstrapping a crud app business then 1 beefy hetzner box (or something slightly more reliable) with postgresql is probably fine until you reach scale where you sell the business. You care about burn rate above all.

If you are VC backed go all in on gcp or aws because thats what you're expected to do and and what the expensive people you hire are going to know.

I agree but would slightly modify it in that if you have taken VC money, growth probably matters above all else. Don't waste time on activities not related to the product being sold.
I really wonder whether a VC would rather invest into a startup with an architect focusing on KISS or one where the architect goes all in on cloud.
You can open a ticket and make the weirdest of issues with MSK Amazon’s problem to deal with.

Same with RDS, etc.

It’s pretty great not to waste time when the lottery for the bizarrest of 0.000001% issues arise.

The operator only solves the happy path. An AWS support ticket usually can solve the unhappy path.

Sure, but you can also go to a Slack channel and get help from the people who wrote the FOSS code you're using.

For free.

Yep, if your Kafka is mission critical and crashes hard, that is bad.

But things like Kafka are _never_ a black box you just spin up and never worry about, if anyone thinks so, CAP theorem will give them an awful surprise one day.

You're always going to need someone in your team who understands the tech and how to make best use of it.

MSK won't tell you how many partitions your topic needs, or whether your retention strategy should be delete, or compact, or both.

You still need that knowledge of the "managed" service to make effective use of it.

And that knowledge sits rather close to knowledge of how the system works, so given you'll need that knowledge anyway, may as well cultivate it instead.

Oh, and the operators also solve a lot of the unhappy paths too, FYI.

I tend to describe the operator approach as "half-managed" because things like multiple-AZ stretch clusters need some configuration.

But then, maybe you didn't want a 3-AZ cluster? Maybe a 2.5? MSK says no.

> You're always going to need someone in your team who understands the tech and how to make best use of it.

> And that knowledge sits rather close to knowledge of how the system works, so given you'll need that knowledge anyway, may as well cultivate it instead.

This has been my argument forever, and it’s always met with disagreement, because entirely too many people have no desire to learn their tooling. They just want an API that they can push data into, and get it back out. What happens inside is irrelevant.

It’s extremely sad to me.

Or hold off on the academic-style pretentiousness and come back down to the real world.

At some point, we have to decide that there's a lot of knowledge expectations depending on your stack, especially as parts of your application grows.

Say you're a Python-based webapp running with Postgres, Kafka, and Elasticsearch. Your stack requires pretty decent knowledge of:

1. Postgres

2. Kafka

3. Elasticache

4. Linux (and a lot more than what many developers I've encountered seem to have)

5. Kubernetes, because it is 2024

6. Whatever frameworks you're doing with your webapp + ensuring you're keeping up with security best practices

7. + the soup involved with exposing your webapp to customers

Being able to handle any of these 6 at scale require different skillsets. It's unreasonable to expect anyone to be an expert at all of this -- in a real, tried-and-true environment -- especially with deadlines and SLAs involved.

Counterpoint: stop the sprawl. Use boring technology.

Until you’re at quite a high scale, you probably don’t actually need Kafka. There are plenty of much lighter ways to do pub/sub, including Postgres itself.

Similarly, if your RDBMS schema is properly defined and your queries are well-written, you probably also don’t need Redis / EC.

Re: K8s, if you do need it, I’m not sure why people think that it’s so much easier to run EKS than your own cluster. The only thing you get to skip is the control plane; everything else is still your responsibility. Same with Postgres – you still are wholly responsible for its schema/table maintenance and optimization on major DBaaS.

In any case, nowhere did I say one person should be an expert at all of this.

> Sure, but you can also go to a Slack channel and get help from the people who wrote the FOSS code you're using. > > For free.

Relying on volunteer support of varying degrees of quality for your business sounds insane.

Also at that point the business should really be donating or contributing to the development of the software otherwise it is considered what we call a dick move.

People within my company do contribute to the development of the FOSS software we rely on :)

> Relying on volunteer support of varying degrees of quality for your business sounds insane.

Given my experiences of Confluent paid support, and my experiences of the volunteer support around Kafka, I disagree.

> For free

Not sure we agree on the meaning of this phrase in this context.

If you ever hit an issue with Kafka or Strimzi, go to their Slack, some of the most intelligent people I've ever had the privilege to work alongside will be there, helping you.

For 0 money. That kinda free.

I would prefer to say "free of charge", because that support is not actually free, it has a cost, you're just not required to pay for it.

But you as well as I know, that what the other participant in this conversation means, is that if a for-profit entity relies on support that is "free of charge" in this way, such that it can continue to profit on the back of their product support, then the for-profit entity really ought to seriously consider a voluntary donation of some kind to support the continued maintenance and support of the product.

And while that's super awesome that someone feels passionately enough about a piece of tech - that they're willing to spend their precious resources helping others... that kind of charity is untenable. You can't expect that person to be there at 3am when systems are down and your nightly processing jobs are failing.
I am expected to produce business value at the end of the day and I wear multiple hats. Paying someone to be the expert in the room is the best value sometimes.

I’d rather focus on my expertise and mental energy in other tools that are much more significant to the stack I support.

This has not been my experience at multiple companies with AWS, even with heavy spend – your tickets have to make it through a gatekeeper who has no more idea than you on how to fix it, and more triage than anything else.
In my experience, It Depends

For big flagship services you can usually get pretty good support (EC2, S3, SQS, Lambda)

For smaller/more niche services where AWS stood up a managed version of some OSS it's more hit and miss (like managed RabbitMQ).

In both cases, it definitely helps to have an open line to your TAM and send them case numbers and they'll usually do some internal nudging to keep things moving. In addition, for projects, you can usually reach out ahead of time and get some dedicated SMEs to help set things up/train you.

In either case, hopefully you've never had the displeasure of working with Azure support.

I only have the opposite. Great support with amazingly deep knowledge at every level.
Same for the most part. Our TAMs have been great to work with and so have a number of engineers the handful of times we needed it. We've had moments of some back-and-forth at times, but overall I've been satisfied.
Can you? While Amazon support is one of the better ones, you are still asking for an hour or two of time from a support guy who has no idea about your usecases or internal systems.

They usually tend to be genuinely helpful but are a far cry from solving your issues themselves.

Given that AWS has been around for nearly two decades they have probably encountered and have a workaround/fix for 99.99% of the use cases.

Of course there’s a minuscule possibility of you having a new use case. But is that good enough reason to build your infrastructure? That is a business call you need to make.

The problem is that if you're a regular-sized company, you will never reach any support person with experience inside AWS ;) And paying for Enterprise-grade support at a medium to small scale is probably more expensive than just hiring 1 skilled operator. And in the latter case, it then doesn't matter anymore if the problem takes 1 hour our 10 hours because your employee can take as much time as needed.
That's ultimately the question. It comes down to cost and time. If you have enough scale that hiring a full-time person is more cost effective than paying for managed, great. On the flip side, you don't necessarily want to take engineering hours away from building the product you sell.
Oftentimes, when you see someone proposing "just save 70% by installing open source XYZ", they are thinking like an individual and not a business. Fast-moving startups and medium businesses in areas with high cost of labor can save a ton by outsourcing labor to AWS/Azure if they are okay with the lock-in. Of course, each case is different and people shouldn't just blindly adopt AWS/Azure without thinking about it...
Honestly most of the stuff I do is internal facing tooling with usually less than 100 concurrent and 1k peak users. For those, managing a server or two, or god forbid, a small autoscaling cluster is not a hassle.

For high-scale operations, you need to think real hard about how you do things and usually simplicity is key, and trying to do a little as possible on the high throughput parts is useful.

The costs do add up when you have professionals maintaining your Cassadra/Kafka boxes, but the same degree of complexity exists on AWS, when you try to weave together a tapestry of EC2s, lambdas, various storage services, with all the delicious complexity of multiple VPCs and networking fineries while not blowing the budget.

It's a different skillset, but not less work.

Hear hear. I get this all the time. People just don’t get that what they are paying for, say, platform services (managed databases, indexing, all sorts of data handling) is vastly cheaper than reimplementing those particular wheels - or hiring the people to manage them - and that the hyperscalers provide redundancy, automated deployment, backups, the works.

Even storage in hyperscalers is inherently redundant—and I keep getting folk who ask about setting up their own RAID array, or using their own containers and job management when there’s a dozen zero-code alternatives in each individual hyperscaler.

I can run a 64x512GiB server in my home office loaded with NVMe drives for $80/mon (probably cheaper depending on how many years you amortize the server purchase over)!
This is what we're trying to address at Lithus[1]. We're offering both the raw compute resources, and also the DevOps time needed to setup and manage the services your engineering team needs.

[1] https://lithus.eu

depends on scale - at small scale, fully managed services are a godsend but at <x> scale (esp per-service) then it pays to self-manage or use low cost or FOSS mgmt tools.
I'm not sure what the cost difference is for using higher level services but I can easily imagine it 4x-10x'ing your costs again, or worse.

Part of me thinks, man, the engineers not afraid of setting up a p Postgres or Redis really should be worth a lot more, given how absurd the prices can get. I guess the getting started costs for these services are usually manageable though; by the time the bill is big it's a "nice problem to have" because you have significant load now, and presumably customers & revenue to show for it.

More so, I think orgs are somewhat rightfully afraid of running infra because historically we have been bad at it. It's been every sys-op or devops for themselves in the world. Everyone making their own practices, assembling their own stack of networking setup, init scripts, db procedures, monitoring, alerting, resilience/reliability. This stuff has a lot of dimensions of care to it.

And even when you go the extra mile to document everything, it's still rough to hand-off ownership. A new gal joins; how long does it take to get comfortable? And how much will her style & preferences mesh with whats been string up so far? Or worse, what happens when someone quits? How load bearing were they?

And this is why I'm so humungouely excited about Kubernetes. Fleet was pretty sweet & cool & direct in the past, RIP, but like so many of the "way to run containers" option it was just that: a way to run containers. Having an extensible system, where operators keep networking, storage, databases running, where tasks like backups and migrations and high availability are built in to well tested controllers: it cuts out so so so many things that operators had to discover, socialize, and test test test test test test before. There's such incredibly good load bearing systems-that-maintain-systems (i.g. autonomic) available, that compete very much with the paid for/managed services that have done likewise for us for so long.

And it's a consistent paradigm, for whatever you are up to. Write a manifest with what you want, send it to api-server, wait for operator to make it so. Instead of having different dimensions or concerns have different operational paradigms & styles, there's a unified extensible Desired State Management that does a damn good job.

It felt like running services was in a dark ages for so long, that each.shop was fractured & alone with their infrastructure, and it was obvious why managed services were winning. But today there's a hope that we can run services, well, in a way that will be very clear & explicit if it ever needs to be handed off.

>Part of me thinks, man, the engineers not afraid of setting up a p Postgres or Redis really should be worth a lot more, given how absurd the prices can get.

But only if they agree to be on call 24/7 to support what they deployed. Ask engineers to guarantee you won’t loose data and see how they tell you to buy RDS.

Not to mention having to add additional security staff.
This.

To add, if you every want to get ISO/PCIDSS etc certification done then good luck implementing gazillion check list items which Azure/AWS/GCP have already taken care of.

Which is bullshit, because the auditors ALWAYS miss stuff, even things I would think are painfully obvious. It’s a cottage industry that allows the C-Suite to assure investors that they have taken all necessary precautions, so when they get hacked they can point and say “we were certified!”
I completely agree with you that they are mostly used as CYA. However, I'm speaking from practical standpoint where if you have to work in certain industries (banking, health, finance etc.,) the first thing you are asked is if you have XYZ certification.
It’s not a cottage industry. It is literally the law if you need to operate in some regions.