|
|
|
|
|
by epdlxjmonad
2244 days ago
|
|
For debugging a distributed system, it may be just okay to use the traditional way consisting of testing, log analysis, and visualizing that everyone is familiar with. Yes, there are advanced techniques such as formal verification and model checking, but depending on the complexity of the targe distributed system, it may be practially out of the question or just not worth your time to try to apply these techniques (unless you are in a research lab or supported by FAANG). In other words, it may be that there is nothing inherently inefficient with sticking to the traditional way because distributed systems are hard to debug by definition and there is (and will be) no panacea. We have gone through the pain of testing and debugging a distributed system that is under development for the past 5 years. We investigated several fancy ways of debugging distributed systems such as model checking and formal verification. In the end, we decided to use (and are more or less happy with) the traditional way. The decision was made mostly because 1) the implementation is too complex (a lot of Java and Scala code) to allow formal techniques; 2) the traditional way can still be very effective when combined with careful design and thorough testing. Before building the distributed system, we worked on formal verification using program logic and were knowledgeable about a particular sub-field. From our experience, I guess it would require a PhD dissertation to successfully apply formal techniques to debugging our distributed system. The summary of our experience is at https://www.datamonad.com/post/2020-02-19-testing-mr3/. |
|