Hacker News new | ask | show | jobs
by diggan 2655 days ago
While building the Wikipedia mirror on IPFS (with search), we tried using the dumps from Wikipedia themselves but ended up using Zim archives from kiwix.org instead. The end result is here: https://github.com/ipfs/distributed-wikipedia-mirror

For actually ingesting the archives, dignifiedquire expanded a Rust utility aptly named Zim, which you can find here https://github.com/dignifiedquire/zim

Both repos contain information (and code of course) on how to extract information from the Zim archives