Hacker News new | ask | show | jobs
by roozbeh18 824 days ago
Can someone tell me how this is collected in SQLite
1 comments

I wrote a blog post a while back about reading these dumps: https://search.feep.dev/blog/post/2021-09-04-stackexchange

Presumably they have a script that does something similar to that process, and then writes the resulting data into a predefined table structure.

Nice post!

Yep, my process is similar. It goes...

  - decompress (users|posts)  
  - split into batches of 10,000  
  - xsltproc the batch into sql statements  
  - pipe the batches of statements into sqlite in parallel using flocks for coordination
On my M1 Max it takes about 40 minutes for the whole network. Then I compress each database with brotli which takes about 5 hours.