Hacker News new | ask | show | jobs
by Rochus 2117 days ago
What is the use case? Why is it important that "All keys, values are backed by blake 256 bit checksum"?
2 comments

It seems to be intended as the backend database for the Dero blockchain smart contract platform: https://medium.com/deroproject/graviton-zfs-for-key-value-st...

The post claims: "The features included in Graviton provide the missing functionality that prevented Stargate RC1 from reaching deployment on our mainnet."

I'm not sure, but I guess that this checksumming is relevant for storing the Merkle trees encoding the blockchain. I don't know why the previous choice of database wasn't suitable.

ZFS stores the checksums of files to prevent bit rotting. Since they are comparing their database to ZFS, I guess it stores the checksums for the same reason. If bit rotting occurs, you don't need to discard the entire database, just the affected entry. If the entry was already there for some time, you might even be able to restore it from a backup.
I can understand it with a file system; but in a typical key/value store application the data elements are much smaller (likely even smaller than the hash result).
Isn't a 256-bit Blake hash a little OTT, versus a simple CRC, or even a faster, smaller hash like MurmurHash or Jenkins-one-at-a-time?
It's a cryptographic hash, so it will detect tampering with the data, which a simple CRC, MurmerHash or Jenkins would not.
Still, I'd like an option to use a faster, more efficient CRC or hash - bit rot is usually the main threat, rather than tampering. Not to mention that if a user can tamper with the data they can probably just create a new hash at the same time.

Using a cryptographic hash as a souped-up CRC seems rather odd, given how many more CPU cycles and RAM it will use, but I don't know the reasoning behind the decision; there must be one.

> if an attacker can tamper with the data they can probably just create a new hash at the same time

That's true for ordinary databases, but this was developed for a blockchain and uses a Merkle hash tree.

An attacker can only tamper with the data and create a new hash for a data item by also creating a new hash for every node up to the root of the tree. In a blockchain context, even that isn't enough, they'd have to modify the blockchain nodes as well, as I presume they periodically record tree root hashes.

The hash tree gives it some other interesting features too. O(n) diff time, where n is the number of changes output in the diff, is probably due to having a hash tree.

The fast diff would also work with a non-cryptographic hash, but it would be considered not quite reliable enough against occasional, random errors. With a cryptographic hash, for non-security purposes we treat the values as reliably unique for each input. For example, see Git which depends on this property.

I meant to say "attacker", rather than "user".