| In my opinion this is pre-cloud thinking. It used to be that distributed systems were a big trade off. They were operationally complex, they had limited apis (NoSQL), but they scaled. The best solution used to be to build things using a non-scalable but easy to use and run system, and then re-write it later if it needed to scale (often in a big hurry). This is just not the case any more, though. Why? Two reasons: 1. We’ve gotten much better at distributed systems so the apis aren’t nearly as limited. It’s no longer that you either choose hard-to-use things like Hadoop/NoSQL or elegant but unscalable single-server databases. You can have both good abstractions and scale. 2. The cloud makes it possible to get systems as a service so there should be way less ops than running a single node system yourself In the case of Kafka, I’m super biased as I’m one of the original authors, but I think the abstractions Kafka gives, stream processing capabilities, connectors, etc are just way better than a lot of the traditional solutions. Using something worse until you “need” Kafka might make sense on premise, but not in the cloud. Confluent offers a Kafka service which is fully managed so you don’t do any of the upgrades, security patches, midnight pages, etc you just use the APIs. This is super affordable for the kind of simple apps the article describes. The price varies by cloud, but e.g. on GCP it starts at $0.11/GB for reads and $0.10/GB stored. That is a lot cheaper than using a single node system and then rebuilding everything if you need to scale, but not only that, it is also lower operational overhead (effective none) and a better interface/abstraction. I think this isn’t unique to Kafka, either. There are great managed systems that are built to scale for most of the kinds of data systems you would use—-CockroachDB, Spanner, Aurora, Snowflake, Elasticsearch, Bigquery, etc. Basically, you can have nice things now, just like the big tech companies. |