|
|
|
|
|
by bmichel
2246 days ago
|
|
Thanks for sharing your experience on this rewrite. I have a few questions: 1. I'm a bit surprised that you don't persist the scenarios of the tests after a failure, only its seed. Does it mean that when you want to replay it, you have to redo the minimization phase? Or, do you have a way to find a seed for the minimized scenario? 2. Do you have some tests where you generate a set of operations and play them twice: one time with the mocks and one time on the real servers to check that they have the same results? 3. The article says "Note also the importance of the commit hash, as another type of “test input” alongside the seed: if the code changes, the course of execution may change too!". How are you ensuring that a commit really fixes a bug, and not just change the execution path to a happy path where the conditions of the bug are not met? By playing again a lot of tests, or do you write a new unit test that exhibit the bug to ensure the reproductability? 4. Do you think we can say that CanopyCheck is applying randomized testing at the unit tests level and Trinity is applying it at the integration tests level? |
|
1. It does redo the minimization phase, but the actual execution is extremely fast, so this cost is minimal. Storing test outputs gets pretty expensive when you are running millions of tests, and since there are very few failures, recomputing this is worthwhile
2. Yes and no! the article talks about this a little, but the "heirloom" system does essentially this, and the "native" filesystem variant of Trinity runs the same Trinity tester code against a real filesystem. The "no" is due to the issues with randomized testing -- since any operation that you do can affect the RNG, the exact operation that is run for a particular seed can change if you swap any part of the system. For regression tests, the operation sequence can be put into a separate, non-randomized test.
3. both
4. Testing in Nucleus is a sliding scale from "unit-test-like" to "integration-test-like" -- Trinity is mocking plenty of functionality; CanopyCheck is simultaneously testing many different components. It would probably be more accurate to say that CanopyCheck is testing a smaller subset of components with much greater fidelity, and Trinity is testing as much of the sync engine is practical.