Hacker News new | ask | show | jobs
by GuB-42 331 days ago
Kiwix (what the author used) uses "zim" files, which are compressed. I don't know where the difference come from, but Kiwix is a website image, which may include some things the raw Wikipedia dump doesn't.

And 57 GB to 25 GB would be pretty bad compression. You can expect a compression ratio of at least 3 on natural English text.