Hacker News new | ask | show | jobs
by pudwallabee 744 days ago
I have seen Kafka pulled out by its hairs and replaced with request based architecture.

Event driven architecture, to me is itself an antipattern.

It seems like a replacement for batch processing. Replayable messages are AWESOME. Until you encounter the complexity for a system to actually replay them consistently.

As far as the authors video, while there was some truth in there, it was a little thin, compared to the complexity of these architectures. I believe that even though Kafka acts the part of "dumb pipe", it doesnt stay dumb for long, and the n distributions of Kafka logs in your organization could be 1000x more expensive than a monolithic DB and a monolithic API to maintain.

Yes it appears auditable but is it? The big argument for replayability is that unlike an API that falls over theres no data loss. If you work with Kafka long enough you’ll realize that data loss will become a problem you didnt think you had. You’ll have to hire people to “look into” data loss problems constantly with Kafka. Its just too much infrastructure to even care about.

Theres also, something ergonomically wrong with event drive architecture. People dont like it. And it also turns people into robots who are “not responsible” for their product. Theres so much infrastructure to maintain that people just punt everything back to the “enterprise kafka team”.

The whole point of microservices was to enable flexibility, smart services and dumb pipes, and effective CI/CD and devops.

We are nearing the end of microservices adoption whether it be event or request driven. In mature organizations it seems to me that request driven is winning by a large margin over event driven.

It may be counterintuitive, but the time to market of request driven architecture and cost to maintain is way way lower.

5 comments

> I believe that even though Kafka acts the part of "dumb pipe", it doesnt stay dumb for long

In my experience programmers are very happy to do everything in the application (something database people often complain about). What kind of problems do you see?

> If you work with Kafka long enough you’ll realize that data loss will become a problem you didnt think you had. You’ll have to hire people to “look into” data loss problems constantly with Kafka.

Not my experience at all, and I've used Kafka at a wide range of companies, from household-name scale to startups. Kafka is the boring just-works technology that everyone claims they're looking for.

I'm no fan of microservices, but Kafka is absolutely the right datastore most of the time.

> and the n distributions of Kafka logs in your organization could be 1000x more expensive than a monolithic DB and a monolithic API to maintain

Not to mention certain observability vendors bleeding you for all those logs you now need to keep an eye on it.

Absolutely agreed on every point

The unseen critical part of the equation
I think the problem here is Kafka and not event driven architecture. I am a strong proponent of not using Kafka for events. It's wrong 90% of the time and for the other 10% you can find better solutions.

Also, people need to understand that "event driven" has nothing to do with "event sourcing". Just don't keep all the events until eternity, because you can (and because some people think you should because "kafka").

I haven't run into weird Kafka data loss issues like you describe - although, I will note, a lot of applications don't actually have much testing to notice something like 1 in 10k messages being dropped if it was happening.[0]

But when I've done that testing, Kafka hasn't been the problem.

The problem I've run into most is that ordering is a giant fucking pain in the ass if you actually want consistent replayability and don't have trivial partitioning needs. Some consumers want things in order by customer ID, other consumers want things in order by sold product ID, others by invoice ID? Uh oh. If you're thinking you could easily replay to debug, the size and scope of the data you have to process for some of those cases just exploded. Or you wrote N times, once for each of those, and then hopefully your multi-write transaction implementation was perfect!

[0] in fairness, a lot of applications also don't guarantee that they never drop requests at all, obviously. 500 and retry and hope that you don't run out of retries very often; if you do, it's just dropped on the ground and it's considered acceptable loss to have some of that for most companies/applications.

I would say that requiring in-order events is a huge anti-pattern. What guarantee do you have that they were actually produced in-order and all the clocks are in-sync enough to know that without a doubt?
What are the causes of data loss?
Jepsen has written a fantastic article on this issue. I'm not sure if it has been fixed since then. https://aphyr.com/posts/293-call-me-maybe-kafka