Hacker News new | ask | show | jobs
by kopo 2804 days ago
Check out kiwix or zeal. Allows reading entire archives of wikipedia/stackoverflow/ted talks etc etc offline. Plus search. Throw in elasticsearch and you get really powerful customizable search. You can also create your own archive of web content. Given the size of modern harddrives people have no idea how much quality content can happily live on your local machine. No internet required.
2 comments

TIL entire wikipedia is less than 100 GB when compressed.
That's with no images, and iirc it also didn't include any of the math markup last time I looked at one of the xml dumps in one of the loaders (think it was kiwix)
Totally forgot about images. I just dug little further and found this:

"The size of the media files in Wikimedia Commons, which includes the images, videos and other media used across all the language-specific Wikipedias was described as well over 23 TB near the end of 2014"

Considering 30 TB hdd costs around 1500$, still interesting.

I wonder what the size is as of this moment, maybe double? I read somewhere recently that 90% of the content on the Web was created in the last 2 years. Not sure if thats "special media" focused, i.e. video & pictures, as opposed to text. But I'm sure the size of the dump has increased substantially in the last 4 years.
Very likely, there at least need of a filter at some level to size down (maybe getting single language or converting videos to low-res)
If you grab those WD Easystore 8tb drives when they're on sale you can have 32tb for around $560
You can find versions that are 50GB if you dig around the kiwix download page! The entire stackoverflow dump is around the same size.
Sadly, with a 128GB MacBook Pro, this doesn't really apply to me without bringing around an external drive.