Hacker News new | ask | show | jobs
by mrocklin 3954 days ago
Flat CSV or JSON files are hard to parse. Fast CSV parsers and gzip decompression both run at around 100MB/s. If you want to get faster than this you'll need to use better (ideally binary) formats.

This notebook might interest you: http://nbviewer.ipython.org/gist/mrocklin/c16c5c483b2b9859de... , particularly the sections starting at "Eleven minutes is a long time." It compares CSV costs (minutes) to custom binary storage formats (seconds) on a 20 GB dataset.