Hacker News new | ask | show | jobs
by high_byte 1832 days ago
cool. although I think with 4mb window it would be more efficient. 100mb seems excessive, then I assume you wouldn't need a sliding window. (if it works well enough for 100mb)
1 comments

the problem happens with any fixed window spacing regardless of the block size.

If you create a block every Xmb... inserting a single byte at the beginning of the file will change every subsequent block.

You're technically speaking wrong, but I'm sure the author doesn't want to reimplement block storage devices... So the spirit of the message is probably correct
Oh I'm not talking about disks... this is based on how SnowFS (the library for this project) splits up big files into chunks:

https://github.com/Snowtrack/SnowFS/blob/main/src/common.ts#...

The intent is a simple form of delta encoding, the hope is that many chunks will be common between two versions.

I should clarify this. The 100 MB window in SnowFS is currently unrelated to compression as it is only used to compare if a block changed. Each block gets a hash. This is a fallback used for some file formats where the mtime timestamp cannot be trusted. Some files have a change in the first block e.g. 100 MB and that is faster to compare than an entire 8GB file. But this window size is dynamic and can be changed and used for compression in the future
Ahh this is my bad. For some reason I assumed the blocks were part of the storage scheme, but I see they only are used to compute hash, and that the whole file is added to zip. Sorry for the misunderstanding!