Hacker News new | ask | show | jobs
by londons_explore 2075 days ago
The zim file format is far from ideal for compression efficiency - all the best algorithms typically don't allow random access without decompressing everything.

Also, wikipedia has a lot of spam and orphan pages, insanely long lists, etc. Those are hard to algorithmically filter out.