Hacker News new | ask | show | jobs
by voltagex_ 3448 days ago
This is what an "ETL" (Extract-Transform-Load) tool is for. Something like FME Server [1] would handle the first two points and the last point well.

For unzipping something that crazy, I'm interested in your solution - I think I'd have to write a custom zip library and use a RAMdisk or similar.

1: https://www.safe.com/fme/fme-server/

1 comments

Yes, that's ETL. Classic ETL dealt with databases, the modern variant has relaxed this constraint.

As for the zip: We simply "unzip -p" and stream process it carefully (with a custom program reading XML and transforming it). Cuts processing time from hours (extracting the zip and creating all directories, then visiting each file) to minutes (read from a single file).