Hacker News new | ask | show | jobs
by rbtying 2246 days ago
I'm not Sujay, but I worked on this system once upon a time too! (not anymore)

1. It does redo the minimization phase, but the actual execution is extremely fast, so this cost is minimal. Storing test outputs gets pretty expensive when you are running millions of tests, and since there are very few failures, recomputing this is worthwhile

2. Yes and no! the article talks about this a little, but the "heirloom" system does essentially this, and the "native" filesystem variant of Trinity runs the same Trinity tester code against a real filesystem. The "no" is due to the issues with randomized testing -- since any operation that you do can affect the RNG, the exact operation that is run for a particular seed can change if you swap any part of the system. For regression tests, the operation sequence can be put into a separate, non-randomized test.

3. both

4. Testing in Nucleus is a sliding scale from "unit-test-like" to "integration-test-like" -- Trinity is mocking plenty of functionality; CanopyCheck is simultaneously testing many different components. It would probably be more accurate to say that CanopyCheck is testing a smaller subset of components with much greater fidelity, and Trinity is testing as much of the sync engine is practical.

1 comments

Thanks for the answers, I appreciate that.

If you have time, I have another round of questions:

1. Did you try formal methods like TLA+ on the client? I think that the logic covered by CanopyCheck may be a nice target.

2. Do you have some tests with several clients running at the same time on a shared directory? In particular, I think of the termination invariant where the clients are fighting because several users have reorganized the directory by moving a lot of stuff, and each client is trying to converge in a different direction (ie they are making operations that cancelled the ones made by other clients).

3. The article says "In the Nucleus data model, nodes are represented by a unique identifier". Does it happen that a node has to change its identifier? For example, in a scenario like this one:

      (ada) $ offline
    (grace) $ offline
    (grace) $ mv ~/Dropbox/shared/TODO.txt ~/Dropbox/private-grace/
      (ada) $ mv ~/Dropbox/shared/TODO.txt ~/Dropbox/private-ada/
      (ada) $ echo 'foo' >> ~/Dropbox/private-ada/TODO.txt
    (grace) $ echo 'bar' >> ~/Dropbox/private-grace/TODO.txt
      (ada) $ online
    (grace) $ online
1. We thought about this at one point, IIRC, and various parts of Dropbox sync have been formally modeled. But CanopyCheck and Trinity are useful in part because they test the real, production code -- the hash-indexing bug that Sujay mentioned elsewhere on this page is an implementation error, not a design error.

2. Kind of. There are tests for interactions between instances (in particular, between different Dropbox folders on the same machine), but running Dropbox twice on the same folder is explicitly not supported.

3. Yes, this happens. There's logic to handle these changes, since some applications expect to use in-place edits and others swap in a new file -- you don't actually need grace to have an interesting situation. You can experiment with this yourself in a Dropbox folder :)