| Hi! I'm one of the two authors here. At Materialize, we're definitely of the 'we are a bunch of voices, we are people rather than corp-speak, and you get our largely unfiltered takes' flavor. This is my (and George's from Fivetran) take. In particular this is not Frank's take, as you attribute below :) > SQL is declarative, reactive Materialize streams are declarative on a whole new level. Thank you for the kind words about our tech, I'm flattered! That said, this dream is downstream of Kafka. Most of our quibbles with the Kafka-as-database architecture are to do with the fact that that architecture neglects the work that needs to be done _upstream_ of Kafka. That work is best done with an OLTP database. Funnily enough, neither of us are building OLTP databases, but this piece largely is a defense of OLTP databases (if you're curious, yes, I'd recommend CockroachDB), and their virtues at that head of the data pipeline. Kafka has its place - and when its used downstream of CDC from said OLTP database (using, e.g. Debezium), we could not be happier with it (and we say so). The best example is in foreign key checks. It is not good if you ever need to enforce foreign key checks (which translates to checking a denormalization of your source data _transactionally_ with deciding whether to admit or deny an event). This is something that you may not need in your data pipeline on day 1, but adding that in later is a trivial schema change with an OLTP database, and exceedingly difficult with a Kafka-based event sourced architecture. > Normally you'd have single writer instances that are locked to the corresponding Kafka partition, which ensure strong transactional guarantees, IF you need them. This still does not deal with the use-case of needing to add a foreign key check. You'd have to: 1. Log "intents to write" rather than writes themselves in Topic A
2. Have a separate denormalization computed and kept in a separate Topic B, which can be read from. This denormalization needs to be read until the intent propagates from Topic A.
3. Convert those intents into commits.
4. Deal with all the failure cases in a distributed system, e.g. cleaning up abandoned intents, etc. If you use an OLTP database, and generate events into Kafka via CDC, you get the best of both worlds. And hopefully, yes, have a reactive declarative stack downstream of that as well! |
People do do this. I have done this. I wish I had been more principled with the error paths. It got there _eventually_.
It was a lot of code and complexity to ship a feature which in retrospect could have been nearly trivial with a transactional database. I'd say months rather than days. I won't get those years of my life back.
The products were build on top of Kafka, Cassandra, and Elasticsearch where, over time, there was a desire to maintain some amount of referential integrity. The only reason we bought into this architecture at the time was horizontal scalability (not even multi-region). Kafka, sagas, 2PC at the "application layer" can work, but you're going to spend a heck of a lot on engineering.
It was this experience that drove me to Cockroach and I've been spreading the good word ever since.
> If you use an OLTP database, and generate events into Kafka via CDC, you get the best of both worlds.
This is the next chapter in the gospel of the distributed transaction.