| > In my experience with two-phase commits, if the system crashes before a transaction is fully processed and committed, it should be fully reprocessed (from scratch) once the server restarts. Then the problem becomes, in a fully distributed system how do you know which was the last successful commit before the system crashed? That 'settled' flag you mention may be set in the coordinator node's storage just before the coordinator crashed, but not communicated to any other node because of the crash. So the other nodes have to wait for the coordinator to restart, and replay its storage, to find out if that transaction was successful. That pause can be quite long, especially if it involves a coordinator rebooting, and really long if it involves resynchronising a RAID, checking filesystems/DB integrity, etc. The other nodes could decide it doesn't matter, exclude the coordinator from the cluster, and reprocess starting from an older transaction. But then they have to agree which older transaction - a distributed consensus problem. This is certainly possible, but I wouldn't call it simple, and it wouldn't be two-phase commit any more. Restarting older transactions tends to be visible to clients as well, because they may see transaction retries, and it's often desirable to minimise the number of those from a distributed database for minor failures (such as one node harmlessly going offline). Among other things, they can cause load spikes in the large system, and may exercise retry corner cases that don't come up normally, surfacing bugs that shouldn't be there in application code, but are. |