Hacker News new | ask | show | jobs
by pradn 2159 days ago
The English Wikipedia text (no images) is about ~20 GB compressed, but ~60 GB uncompressed.
1 comments

I only use the sources available to me at the time.

Another value published by Wikipedia is 30GiB [1] as of 2020, which includes punctuation and markup.

I explicitly put the measurement unit as ASCII characters. If you have a better source for your size (remember: ASCII characters for article words only, no markup), feel free to post it.

[1] https://en.wikipedia.org/wiki/Wikipedia:Size_in_volumes