Hacker News new | ask | show | jobs
by polarix 1336 days ago
This has been available for a while but it's great to see some acknowledgement especially since the most recent data set was stuck in 2019 for a while.

Here are the datasets: http://download.kiwix.org/zim/stack_exchange/

It's not clear to me why the data set shrank between 2019/3 and 2022/6; was something excluded? Compression improvements?

> stackoverflow.com_en_all_2019-02.zim 2019-03-12 19:53 134G

> stackoverflow.com_en_all_2022-05.zim 2022-06-17 12:36 75G

2 comments

The data isn't stuck. The data is available here

https://archive.org/details/stackexchange

It's the "official" place to get the data

I've download it several times and extracted my own contributions.

The article states:

> ... to ensure that an up-to-date version of our dataset is easily available for those who need it, and will work to improve its readability and reduce its size so there is less friction for end users...