Hacker News new | ask | show | jobs
by karaterobot 795 days ago
What do you mean it's not seekable?

> ZIP files are a collection of individually compressed files, with a directory as a footer to the file, which makes it easy to seek to a specific file without reading the whole file... The nature of .zip files makes it possible to seek and read just the columns required without having to read/decode the other columns.

1 comments

Seeking within a column
There's two ways to limit the number of column-rows you have to read. One is by file partitioning, that is having many ZSV files rather than one giant one, ideally organized by partitioning key field(s). The other way is mentioned as an extension to the format itself which functions much like rowgroups do in Parquet. https://github.com/Hafthor/zsvutil?tab=readme-ov-file#row-gr...

Thanks for taking a look.