Hacker News new | ask | show | jobs
by EdwardDiego 600 days ago
Unfortunately, it's a false dichotomy you present, it's not a binary choice of fully managed or entirely roll your own.

E.g., if you're running K8s (one thing I typically recommend you buy a managed one of), you can install your own Kafka in it, using an operator that does about 85% of what MSK does.

Sure, you'll need to dedicate person hours to support the operator, but is supporting that any more expensive than supporting AWS products? That you're already paying through the nose for?

3 comments

It's also, what kind of startup are you? What kind of workload do you have?

If you are bootstrapping a crud app business then 1 beefy hetzner box (or something slightly more reliable) with postgresql is probably fine until you reach scale where you sell the business. You care about burn rate above all.

If you are VC backed go all in on gcp or aws because thats what you're expected to do and and what the expensive people you hire are going to know.

I agree but would slightly modify it in that if you have taken VC money, growth probably matters above all else. Don't waste time on activities not related to the product being sold.
I really wonder whether a VC would rather invest into a startup with an architect focusing on KISS or one where the architect goes all in on cloud.
You can open a ticket and make the weirdest of issues with MSK Amazon’s problem to deal with.

Same with RDS, etc.

It’s pretty great not to waste time when the lottery for the bizarrest of 0.000001% issues arise.

The operator only solves the happy path. An AWS support ticket usually can solve the unhappy path.

Sure, but you can also go to a Slack channel and get help from the people who wrote the FOSS code you're using.

For free.

Yep, if your Kafka is mission critical and crashes hard, that is bad.

But things like Kafka are _never_ a black box you just spin up and never worry about, if anyone thinks so, CAP theorem will give them an awful surprise one day.

You're always going to need someone in your team who understands the tech and how to make best use of it.

MSK won't tell you how many partitions your topic needs, or whether your retention strategy should be delete, or compact, or both.

You still need that knowledge of the "managed" service to make effective use of it.

And that knowledge sits rather close to knowledge of how the system works, so given you'll need that knowledge anyway, may as well cultivate it instead.

Oh, and the operators also solve a lot of the unhappy paths too, FYI.

I tend to describe the operator approach as "half-managed" because things like multiple-AZ stretch clusters need some configuration.

But then, maybe you didn't want a 3-AZ cluster? Maybe a 2.5? MSK says no.

> You're always going to need someone in your team who understands the tech and how to make best use of it.

> And that knowledge sits rather close to knowledge of how the system works, so given you'll need that knowledge anyway, may as well cultivate it instead.

This has been my argument forever, and it’s always met with disagreement, because entirely too many people have no desire to learn their tooling. They just want an API that they can push data into, and get it back out. What happens inside is irrelevant.

It’s extremely sad to me.

Or hold off on the academic-style pretentiousness and come back down to the real world.

At some point, we have to decide that there's a lot of knowledge expectations depending on your stack, especially as parts of your application grows.

Say you're a Python-based webapp running with Postgres, Kafka, and Elasticsearch. Your stack requires pretty decent knowledge of:

1. Postgres

2. Kafka

3. Elasticache

4. Linux (and a lot more than what many developers I've encountered seem to have)

5. Kubernetes, because it is 2024

6. Whatever frameworks you're doing with your webapp + ensuring you're keeping up with security best practices

7. + the soup involved with exposing your webapp to customers

Being able to handle any of these 6 at scale require different skillsets. It's unreasonable to expect anyone to be an expert at all of this -- in a real, tried-and-true environment -- especially with deadlines and SLAs involved.

Counterpoint: stop the sprawl. Use boring technology.

Until you’re at quite a high scale, you probably don’t actually need Kafka. There are plenty of much lighter ways to do pub/sub, including Postgres itself.

Similarly, if your RDBMS schema is properly defined and your queries are well-written, you probably also don’t need Redis / EC.

Re: K8s, if you do need it, I’m not sure why people think that it’s so much easier to run EKS than your own cluster. The only thing you get to skip is the control plane; everything else is still your responsibility. Same with Postgres – you still are wholly responsible for its schema/table maintenance and optimization on major DBaaS.

In any case, nowhere did I say one person should be an expert at all of this.

> Until you’re at quite a high scale, you probably don’t actually need Kafka.

As someone who accidentally specialised in Kafka... ...bingo.

So many companies using it who don't need the sheer scale it offers, and get to pay the complexity cost anyway, with no benefits.

> Sure, but you can also go to a Slack channel and get help from the people who wrote the FOSS code you're using. > > For free.

Relying on volunteer support of varying degrees of quality for your business sounds insane.

Also at that point the business should really be donating or contributing to the development of the software otherwise it is considered what we call a dick move.

People within my company do contribute to the development of the FOSS software we rely on :)

> Relying on volunteer support of varying degrees of quality for your business sounds insane.

Given my experiences of Confluent paid support, and my experiences of the volunteer support around Kafka, I disagree.

> For free

Not sure we agree on the meaning of this phrase in this context.

If you ever hit an issue with Kafka or Strimzi, go to their Slack, some of the most intelligent people I've ever had the privilege to work alongside will be there, helping you.

For 0 money. That kinda free.

I would prefer to say "free of charge", because that support is not actually free, it has a cost, you're just not required to pay for it.

But you as well as I know, that what the other participant in this conversation means, is that if a for-profit entity relies on support that is "free of charge" in this way, such that it can continue to profit on the back of their product support, then the for-profit entity really ought to seriously consider a voluntary donation of some kind to support the continued maintenance and support of the product.

My company contributes to FOSS projects we use :)
And while that's super awesome that someone feels passionately enough about a piece of tech - that they're willing to spend their precious resources helping others... that kind of charity is untenable. You can't expect that person to be there at 3am when systems are down and your nightly processing jobs are failing.
I am expected to produce business value at the end of the day and I wear multiple hats. Paying someone to be the expert in the room is the best value sometimes.

I’d rather focus on my expertise and mental energy in other tools that are much more significant to the stack I support.

This has not been my experience at multiple companies with AWS, even with heavy spend – your tickets have to make it through a gatekeeper who has no more idea than you on how to fix it, and more triage than anything else.
In my experience, It Depends

For big flagship services you can usually get pretty good support (EC2, S3, SQS, Lambda)

For smaller/more niche services where AWS stood up a managed version of some OSS it's more hit and miss (like managed RabbitMQ).

In both cases, it definitely helps to have an open line to your TAM and send them case numbers and they'll usually do some internal nudging to keep things moving. In addition, for projects, you can usually reach out ahead of time and get some dedicated SMEs to help set things up/train you.

In either case, hopefully you've never had the displeasure of working with Azure support.

I only have the opposite. Great support with amazingly deep knowledge at every level.
Same for the most part. Our TAMs have been great to work with and so have a number of engineers the handful of times we needed it. We've had moments of some back-and-forth at times, but overall I've been satisfied.
Can you? While Amazon support is one of the better ones, you are still asking for an hour or two of time from a support guy who has no idea about your usecases or internal systems.

They usually tend to be genuinely helpful but are a far cry from solving your issues themselves.

Given that AWS has been around for nearly two decades they have probably encountered and have a workaround/fix for 99.99% of the use cases.

Of course there’s a minuscule possibility of you having a new use case. But is that good enough reason to build your infrastructure? That is a business call you need to make.

The problem is that if you're a regular-sized company, you will never reach any support person with experience inside AWS ;) And paying for Enterprise-grade support at a medium to small scale is probably more expensive than just hiring 1 skilled operator. And in the latter case, it then doesn't matter anymore if the problem takes 1 hour our 10 hours because your employee can take as much time as needed.
That's ultimately the question. It comes down to cost and time. If you have enough scale that hiring a full-time person is more cost effective than paying for managed, great. On the flip side, you don't necessarily want to take engineering hours away from building the product you sell.