Hacker News new | ask | show | jobs
by mtrn 3447 days ago
Yes, that's ETL. Classic ETL dealt with databases, the modern variant has relaxed this constraint.

As for the zip: We simply "unzip -p" and stream process it carefully (with a custom program reading XML and transforming it). Cuts processing time from hours (extracting the zip and creating all directories, then visiting each file) to minutes (read from a single file).