|
|
|
|
|
by benschulz
2181 days ago
|
|
My reading of the article's introduction is that Redis is adding this feature and are (among other things I'm sure) paying jepsen to test it. So this is them having tests. > If you follow basic software engineering principles, you'll find distributed systems easier to approach. When I implemented Paxos I had tests and when they failed they spit out an exact trace of what happened in what order and on what node. Sometimes it was still excruciating to figure out what happened. Here's[1] a comment which you can think of as a bug tombstone. It took me half a day to figure out after I had a trace to analyze the issue. [1]: https://github.com/benschulz/paxakos/blob/ee051ff67b5da6f287... |
|
Full-scale blackbox testing of a database system is similar to dogfooding. You only use it when you have high confidence that you have exhausted the possibilities of unit and integration tests. It's clear this project did not start with exhaustive unit tests.
It reminds me a bit of FoundationDB, which is also a terrible program nobody should entrust with data they ever want to see again. The first time I tried to use it it ran out of memory and crashed in about ten seconds. I found the problem, which was that their huge-page-aware allocator, which has no tests, had never actually been used by anybody on a machine with huge pages. It was a core library of a released database which had never been executed by anyone. This Redis thing is the same: nobody had ever said "RAFT SET foo bar", if they had done they would have seen the problem right away.