|
|
|
|
|
by technion
3469 days ago
|
|
I've been referred to this paper before when asking this question. Whilst it does discuss that Amazon used TLA+ in nontrivial circumstances, it's still a long mental leap to me to get to "TLA+ for AWS S3 looked like this, and this is the sort of bug it helped quash". It'd be great to see some of that sort of thing. |
|
The kind of bugs you find when verifying distributed systems is, if this machine was in that state and that message was in transit, and then the other machine failed, then data was lost".
Basically, because formal verification of this kind ensures the specification is correct, it will find any kind of bug there is. But it's those bugs that are either a result of a severe flaw in the design, or bugs that would be hard to find in testing but their results may be serious, that make this worthwhile.