|
|
|
|
|
by londons_explore
1223 days ago
|
|
Another usecase:. Binary trees of compressed data. Imagine you're compressing Wikipedia and want to get the best ratio possible while also being able to access randomly any article. If you compress each article individually, words like 'citation needed' will end up replicated in most articles. Another approach is to use a dictionary. This solves the citation needed usecase. But we can still do better. There will be lots of common content between the 'general relativity' and 'special relativity' pages, and likewise between the 'France' and 'Germany' pages. Ideally we'd have different dictionaries for different topics. But the dictionaries themselves have overlap, so it would be good to compress them. So we end up with a tree-of dictionaries to decode any article. However, if you now want to do a full decompression of every article, you ideally don't want to reprocess the dictionary for every decompression. So you want to be able to checkpoint the decompressor state. |
|
http://fileformats.archiveteam.org/wiki/Zstandard_dictionary