Hacker News new | ask | show | jobs
by zond 4887 days ago
I would have preferred other algorithms, but there was a strict need for 1) sorted data and 2) that 2 identical trees had the same structure (for the merkle element). Not many structures were left to choose from, and radix seemed to work well enough.
2 comments

My concern is that you're mixing two different concepts.

Do you really need data to be ordered? Why do you care about having "close" data on the same node?

The synchronization uses Merkle trees, and they require ordered data since they hash contiguous data into a tree of hashes.

And to avoid having a separate structure for the Merkle trees I just hash all nodes in the main tree, and compare the hashes to find differences.

Thus the same content must have the same structure, or the comparisons won't work.

I think I didn't make myself clear:

- You say However, since it could be very useful for users of a database to store ordered data, or to wilfully concentrate certain data on certain parts of the cluster, god does not force the user to hash the keys. -> why do you care about how the data is actually stored? - To map keys to values, a mapping structure is needed. For infrastructural reasons (synchronization and cleaning) as well as for functionality of different kinds, we need a sorted mapping, and it has to be deterministically structured. -> why?

> why do you care about how the data is actually stored?

I just said I don't. 'god does not force the user to hash the keys'.

> > To map keys to values, a mapping structure is needed. For infrastructural reasons (synchronization and cleaning) as well as for functionality of different kinds, we need a sorted mapping, and it has to be deterministically structured.

> why?

Functionality: To be able to return the first or the n'th entry it has to be ordered.

Synchronization/cleaning: To be able to hash element 0000-000f we need an efficient way to fetch a segment of elements, thus it again has to be ordered.

To optimize the hashing so that I don't have to keep two separate data structures I keep the hashes in the nodes of the sorted data structure. Thus the structure has to be deterministically structured or the hashes won't be equal even if the trees contain the same data.

Thank you for your answers.
Other comments, timestamps will not work as you expect if you have a network partition and you are using your own time synchronization mechanism. Use NTP instead.
I don't think I understand what you mean. Or you don't understand how my timestamps and time synchronization works?