Hacker News new | ask | show | jobs
by JasonPunyon 824 days ago
Nice post!

Yep, my process is similar. It goes...

  - decompress (users|posts)  
  - split into batches of 10,000  
  - xsltproc the batch into sql statements  
  - pipe the batches of statements into sqlite in parallel using flocks for coordination
On my M1 Max it takes about 40 minutes for the whole network. Then I compress each database with brotli which takes about 5 hours.