Hacker News new | ask | show | jobs
by UltraSane 482 days ago
VAST storage does something like this. Unlike how most storage arrays identify the same block by hash and only store it once VAST uses a content aware hash so hashes of similar blocks are also similar. They store a reference block for each unique hash and then when new data comes in and is hashed the most similar block is used to create byte level deltas against. In practice this works extremely well.

https://www.vastdata.com/blog/breaking-data-reduction-trade-...

1 comments

That’s very interesting. Typically a Rabin fingerprint is used to identify identical chunks of data.

Identifying similar blocks and, maybe sub-rechunking isn’t something I’ve ever considered.