|
|
|
|
|
by MauranKilom
867 days ago
|
|
Two big question marks after reading this and the linked home page (partially pointed out in other comments): - If there's a flaw in your application code that causes a crash (as is the motivating example in the essay), then restoring the entire program into the state it was in just before the crash happened would just cause it to crash again ad infinitum. Sure, this model helps against "my VM instance got preempted", but that's a pretty different category of "crash" (and also notably unrelated to supervision trees). - "External" state (like an API endpoint being down/returning gibberish) can be part of the reason why your program got into a bad state. In fact, that's disproportionately likely, since external "weirdness" is comparatively hard to cover exhaustively in tests. In such a situation, the suggested computation model would never be able to recover even when restarted, because it would forever retain the (bad) API response. Effectively, this is just caching all the non-pure effects of your computation, and we all know about cache invalidation being a hard problem... |
|
The hardest part was of course managing external state and journaling exactly where you had got to with external transaction APIs. Further backend reconciliation was available to flag this (and avoid Post Office scenarios).
Note that French NF525 almost mandates this design, at least for point-of-sale systems: every financial transaction has to be durably written for tax auditing purposes.