|
|
|
|
|
by mbreese
637 days ago
|
|
In bioinformatics we use a modified gzip format called bgzip that exploits this fact heavily. The entire file is made of concatenated gzip chunks. Each chunk then contains the size of the chunk (stored in the gzip header). This lets you do random access inside the compressed blocks more efficiently. Sadly, the authors hard coded the expected headers so it’s not fully gzip compatible (you can’t add your own arbitrary headers). For example, I wanted to add a chunk hash and optional encryption by adding my own header elements. But as the original tooling all expects a fixed header, it can’t be done in the existing format. But overall it is easily indexed and makes reading compressed data pretty easy. So, there you go - a practical use for a gzip party trick! |
|
[0] https://numpy.org/doc/stable/reference/generated/numpy.savez...