| HN Mirror

To follow up on this a little bit--many of my clients do their own Jepsen testing, or have analogous tests using their own testing toolkit. When they engage me, the early part of my work is reviewing their existing tests, looking for problems, and then expanding and elaborating on those tests to find new issues in the DB.

Companies are finding bugs using Jepsen internally, which is great! But when they hire me, I'm usually able to find new behaviors. Some of that is exploring the concurrency and scheduling state space, some of it is reviewing code and looking for places where tests would fail to identify anomalies, some of it is designing new workloads or failure modes, and some is reading the histories and graphs, and using intuition to guide my search. I've been at this for five years now (gosh) and built up a good deal of expertise, and coming at a system with an outsider's perspective, and a different "box of tools", helps me explore the system in a different way.

I do work with my clients to determine what they'd like to focus on, and how much time I can give, but by and large, my clients let me guide the process, and I think the Jepsen analyses I've published are reasonably independent. If there's something I think would be useful to test, and we don't have time or the client isn't interested in exploring it, I note it in the future work section of the writeup.

It's not like clients are saying "please stick ONLY to these tests, we want a positive result." One of the things I love about my job is how much the vendors I work with care about fixing bugs and doing right by their users, and I love that I get to help them with that process. :)