| HN Mirror

No worries, getting database stuff correct is important and you have a right to know how things work before building anything on top of it. You don't want to build a house on a cracked foundation, I'm the same way and I'm trying my best to engineer these things properly.

Here is a link explaining the algorithm https://github.com/amark/gun/issues/87#issuecomment-13636276... pretty briefly, it is also in a terrible location that nobody would know to look for (I need to move it out into the wiki or something).

And to answer your question directly, here is how they compensate each other:

1. Timestamps' vulnerability is to accidental or malicious clock drift. I can change my machine's local clock to be 2 years in the future. If you use timestamps to decide who "wins" in a conflict, my edits will win for the next 2 years. That sucks and is evil.

2. Vector clocks were invented to get around some of these problems. You increment a vector on every local change such that it is higher than the highest known vector. So Alice updates a value to "Hello World" at state 1, then to "Hello Mars" at state 2, if she then receives an update from Bob of "Hello Jupiter" at state 5, Alice then has to jump all the way up to and past Bob to change the value - say "Hello Pluto" at state 6. This gets around the timestamp vulnerability, because even if Bob were to say the update is at state 999998 all Alice has to do is increment it again to 999999, she doesn't have to wait 2 years or corrupt her clock.

3. Vector clocks' vulnerability is that since the clock is relative to the machine, if the machine reboots it loses its clock. Or even if the machine persists it, it has to play "catch up" when it comes back online - but while it is coming back online nothing stops two machines from accidentally incrementing to the same conflicting clock. At this point you are skrewed, unless you implement some other deterministic resolution - and plenty do exist. But the point of this is that vector clocks were designed for fairly permanent machines, but we now live in a emphemeral world where we might spin up a hundred servers to handle some load and then shut them down. If you do this, you lose the machine's vectors.

4. But timestamps don't have this problem, machines spinning up and down usually do some sort of NTP that gives them a rough estimate of time - drift aside, they don't need to remember anything. So when you combine these two together you get a vector timestamp relative to other vector timestamps. Aka every update includes its local timestamp (which might have drift) but the receiving computer calculates a vector relative to its own local timestamp (which might also have drift). If the sending peer is being malicious, the computed vector will be large which then receiving peer can use to mitigate the timestamp exploit. Equally as much, you can have any number of ephemeral peers coming and going through the network without fear of conflicts occurring or losing vectors.

Does that make sense? I'll be presenting on these subjects at the conference, which Kyle Kingsbury is doing the keynote. So I'll have an opportunity to talk to him and I'm hoping he'll also review the algorithm (the actual algorithm is in the code which you should look at, and I linked to a brief explanation of it at the top of this post) and help me set up Jepsen tests.

Overall, I need a LOT more documentation on this and I'm also wanting to formally verify it with TLA or Coq. We'll also be building a battle testing suite to hammer gun on in real deployed environments to see where problems lie. So please, give it a shot and slam me with any questions or problems you encounter. Have you seen this demo? https://medium.com/@marknadal/gun-0-2-0-pre-release-auto-rec...

Cheers!