|
|
|
|
|
by read_if_gay_
2075 days ago
|
|
> all the content of English's wikipedia without images and videos in just 36GB 36GB seems like a really big number if it's just text. A cursory Google search says 1MB will hold about 500 pages of text (ignoring compression). So 36GB would be something like 18 million pages? Let's say a 1000 page book is 10cm wide, so 18M pages wind up as 1800 meters of books, or 180 meter-wide bookshelves with 10 shelves each, which is maybe a large library? It seems like a lot of that must be external sources. I wonder what percentage was actually written by Wikipedia editors? |
|
A few things to note, though:
1/ it's not pure text content, it's html content, this has a significant overhead
2/ a zim file is not just compressed content, but also huge indexes referencing where is which content. You look for your article's title in the reference table, you find the position of your article in the file and you decompress just that part. This is what allows for selective decompression without decompressing the whole content.