| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by j-pb 2021 days ago

Hey Arjun, Thanks for taking the time to reply! I didn't mean to suggest that this was Franks opinion, I merely explained to the other user, who seemed to have a more negative attitude towards Materialize, why I am extremely excited about it, namely Frank being the CTO, and him having an extremely good track record in terms of research and code.

I think the general gist of "use an OLTP database as your write model if you don't absolutely know what you're doing" is completely sane advice, however I think there are far more nuanced (and in a sense also honest) arguments that can be made.

I think the architecture you've sketched is over engineered for what you'd need for the task. So here's what I'd build for your inventory example, IFF the inventory was managed for a company with a multi Terra Items Inventory that absolutely NEEDS horizontal scaling:

One event topic in Kafka, partitioned along the ID of the different stores whose inventory is to be managed. This makes the (arguably) strong assumption that inventory can never move between stores, but for e.g. a Grocery chain that's extremely realistic.

We have one writer per store ID partition, which generates the events and enforces _serialisability_, with a hot writer failover that keeps up do date and a STONITH mechanism connecting the two. All writing REST calls / GraphQL mutations for its store-ID range, go directly to that node.

The node serves all requests from memory, out of an immutable Data-structure, e.g. Immutable.js, Clojure Maps, Datascript, or an embedded DB that supports transactions and rollbacks, like SQLite.

Whenever a write occurs, the writer generates the appropriate events, applies them to its internal state, validates that all invariants are met, and then emits the events to Kafka. Kafka acknowledging the write is potentially much quicker than acknowledging an OLTP transaction, because Kafka only needs to get the events into the memory of 3+ machines, instead of written to disk on 1+ machine (I'm ignoring OLTP validation overhead here because our writer already did that). Also your default failure resistance is much higher than what most OLTP systems provide in their default configuration (e.g. equivalent to Postgres synchronous replication).

Note that the critical section doesn't actually have to be the whole "generate event -> apply -> validate -> commit to kafka" code. You can optimistically generate events, apply them, and then invalidate and retry all other attempts once one of them commits to Kafka. However that also introduces coordination overhead that might be better served mindlessly working off requests one by one.

Once the write has been acknowledged by Kafka, you swap the variable/global/atom with the new immutable state or commit the transaction, and continue with the next incoming request.

All the other (reading) request are handled by various views on the Kafka Topic (the one causing the inconsistencies in the article). They might be lagging behind a bit, but that's totally fine as all writing operations have to go through the invariant enforcing write model anyways. So they're allowed to be slow-ish, or have varying QOS in terms of freshness.

The advantage of this architecture is that you have few moving parts, but those are nicely decomplected as Rich Hickey would say, you have use the minimal state for writing which fits into memory and caches, you get 0 lock congestion on writes (no locks), you make the boundaries for transactions explicit and gain absolutely free reign on constraints within them, and you get serialisability for your events, which is super easy to reason about mentally. Plus you don't get performance penalties for recomputing table views for view models. (If you don't use change capture and Materialize already, which one should of course ;] )

The two generals problem dictates that you can't have more than a single writer in a distributed system for a single "transactional domain" anyways. All our consensus protocols are fundamentally leader election. The same is true for OLTP databases internally (threads, table/row locks e.t.c.), so if you can't handle the load on a single 0 overhead writer that just takes care of its small transactional domain, then your database will run into the exact same issues, probably earlier.

Another advantage of this that so far has gone unmentioned is that if allows you to provide global identifiers for the state in your partitions that can be communicated in side-effect-full interactions with the outside world. If your external service allows you to store a tiny bit of metadata with each effect-full API call then you can include the offset of the current event and thus state. That way you can subsume the external state and transactional domain into the transactional domain of your partition.

Now, I think that's a much more reasonable architecture, that at least doesn't have any of the consistency issues. So let's take it apart and show why the general populace is much better served with an OLTP database:

- Kafka is an Ops nightmare. The setup of a Cluster requires A LOT of configuration. Also Zookeper urgh, they're luckily trying to get rid of it, but I think they only dropped it this year, and I'm not sure how mature it is.

- You absolutely 100% need immutable Data-structures, or something else that manages Transactions for you inside the writer. You DO NOT want to manually rollback changes in your write model. Defensive copying is a clutch, slow, and error prone (cue JS, urgh...: {...state}).

- Your write model NEEDS to fit into memory. That thing is the needles eye that all your data has to go through. If you run the single event loop variant, latency during event application WILL break you. If you do the optimistic concurrency variant performing validation checks might be as or more expensive than recomputing the events from scratch.

- Be VERY weary of communications with external services that happen in your write model. They introduce latency, and they break your transactional boundary that you set up before. To be fair OLTPs also suffer from this, because it's distributed consistency with more than one writer and arbitrary invariants which this universe simply doesn't allow for.

- As mentioned before, it's possible to optimistically generate and apply events thanks to the persistent data structures, but that is also logic you have to maintain, and which is essentially a very naive and simple embedded OLTP, also be weary of what you think improves performance vs. what actually improves it. It might be better to have 1 cool core, than 16 really hot ones that do wasted work.

- If you don't choose your transactional domains well, or the requirements change, you're potentially in deep trouble. You can't transact across domains/partitions, if you do, they're the same domain, and potentially overload a single writer.

- Transactional domains are actually not as simple as they're often portrayed. They can nest and intersect. You'll always need a single writer, but that writer can delegate responsibility, which might be a much cheaper operation than the work itself. Take bank accounts as an example. You still need a single writer/leader to decide which account ID's are currently in a transaction with each other, but if two accounts are currently free that single writer can tie them into a single transactional domain and delegate it to a different node, which will perform the transaction and write and return control to the "transaction manager". A different name for such a transaction manager is an OLTP (with Row-Level locking).

- You won't find as many tutorials, if you're not comfortable reading scientific papers, or at least academic grade books like Martin Kleppmanns "Designing Data Intensive Applications" don't go there.

- You probably won't scale beyond what a single OLTP DB can provide anyways. Choose tech that is easy to use and gives you as many guarantees as possible if you can. With change capture you can also do retroactive event analytics and views, but you don't have to code up a write-model (and associated framework, because let's be honest this stuff is still cutting edge, and really shines in bespoke custom solutions for bespoke custom problems).

Having such an architecture under your belt is a super useful tool that can be applied to a lot of really interesting and hard problems, but that doesn't mean that it should be used indiscriminately.

Afterthought; I've seen so many people use an OLTP but then perform multiple transactions inside a single request handler, just because that's what their ORM was set to. So I'm just happy about any thinking that people spend on transactions and consistency in their systems, in whatever shape or form, and I think making the concepts explicit instead of hiding them in a complex (and for many unapproachable) OLTP/RDBMS monster helps with that (if Kafka is less of a monster is another story).

I think it's also important to not underestimate the programmer convenience that working with language native (persistent) data-structures has. The writer itself in its naive implementation is something that one can understand in full, and not relying on opaque and transaction breaking ORMs is a huge win.

PS: Plz start a something collaboratively with ObservableHQ, having reactive notebook based dashboards over reactive Postgres queries would be so, so, so, so awesome!