| This is a bit dumbed down, and ignores the domain terminology required to properly discuss the trade-offs here (which is puzzling given that it links to a post by Aphyr, where you can find incredibly thorough discussions around isolation levels and anomalies). > The fundamental problem with using Kafka as your primary data store is it provides no isolation. This is false. I can only assume the author doesn't know about the Kafka transactions feature? To be specific, Kafka's transaction machinery offers read-committed isolation, and you get read-uncommitted by default if you don't opt-in to use that transaction machinery (the docs: https://kafka.apache.org/0110/javadoc/index.html?org/apache/...). Depending on your workload, read-committed might be sufficient for correctness, in which case you can absolutely use Kafka as your database. Of course, proving that your application is sound with just read-committed isolation is can be challenging, not to mention testing that your application continues to be sound as new features are added. Because of that, in general I think that the underlying point of this article is probably correct, in that you probably shouldn't use Kafka as your database -- but for certain applications / use-cases it's a completely valid system design choice. More generally this is an area that many applications get wrong by using the wrong isolation levels, because most frameworks encourage incorrect implementations by their unsafe defaults; e.g. see the classic "Feral concurrency control" paper http://www.bailis.org/papers/feral-sigmod2015.pdf. So I think the general message of "don't use Kafka as your DB unless you know enough about consistency to convince yourself that read-committed isolation is and will always be sufficient for your usecase" would be more appropriate (though it's certainly a less snappy title). |