|
|
|
|
|
by haggy
2026 days ago
|
|
Can you point me at documentation for the fault tolerance of the system? A huge issue for streaming systems (and largely unsolved AFAIK) is being able to guarantee that counts aren't duplicated when things fail. How does Materialize handle the relevant failure scenarios in order to prevent inaccurate counts/sums/etc? |
|
I think the right starter take is that Materialize is a deterministic compute engine, one that relies on other infrastructure to act as the source of truth for your data. It can pull data out of your RDBMS's binlog, out of Debezium events you've put in to Kafka, out of local files, etc.
On failure and restart, Materialize leans on the ability to return to the assumed source of truth, again a RDBMS + CDC or perhaps Kafka. I don't recommend thinking about Materialize as a place to sink your streaming events at the moment (there is movement in that direction, because the operational overhead of things like Kafka is real).
The main difference is that unlike an OLTP system, Materialize doesn't have to make and persist non-deterministic choices about e.g. which transactions commit and which do not. That makes fault-tolerance a performance feature rather than a correctness feature, at which point there are a few other options as well (e.g. active-active).
Hope this helps!