Hacker News new | ask | show | jobs
by inaseer 2118 days ago
Concurrency is hard and we have very poor support for testing correctness of concurrent and distributed systems. Language abstractions help but they aren't nearly enough (as evidenced by this post). My team at Microsoft leverages Coyote to check the safety of our services against such subtle race conditions. We blogged about using it to reliably reproduce and fix a very subtle bug in a bounded buffer implementation over at https://cloudblogs.microsoft.com/opensource/2020/07/14/extre...

If you're using .NET in your projects, you can start taking advantage of such tools _today_. I would like for such tools and testing techniques to become more and more common place in the industry as concurrent and distributed systems are _hard_ and we should use all the help we can get.

2 comments

Go comes with thread sanitizer, which you can enable with go test -race ... If your unit test exercises a race condition, this will blow up your test with stack traces of the data race.

It sounds a bit like Coyote, which also looks very useful for C# applications.

Neat to learn about thread sanitizer. It sounds similar to another tool from Microsoft Research called Torch (https://www.microsoft.com/en-us/research/project/torch/) which automatically instruments binaries to detect data races. Coyote is similar in some ways but different in others. Coyote serializes the execution of the entire program (running one task at a time), exploring one set of interleavings and then rewinding, and then exploring another set of interleavings, hoping to hit hard-to-find safety and liveness bugs. In addition to finding concurrency bugs in one isolated process, we use it to find bugs in our distributed system by effectively running our entire distributed system in one process and having Coyote explore the various states our system can be in. It sounded mind-boggingly cool when I first came across this way of testing distributed systems through Foundation DB (https://www.youtube.com/watch?v=4fFDFbi3toc); we're emulating this kind of testing in our distributed system through Coyote. And unlike Foundation DB which had to develop their own variant of C++ to be able to do this kind of testing (kudos to them for doing it), Coyote allows us to do it on regular C# programs written using async/await asynchrony and benefit from decades of Microsoft Research in exploring large state spaces effectively.
link to the tla+ model of the situation and the proof that proposed solution would work: https://lobste.rs/s/ntati1/even_go_concurrency_is_still_not_...