Hacker News new | ask | show | jobs
by ivanbakel 1763 days ago
>CSVs also are great because you can parse them one row at a time. This makes for a very scale-able and memory-efficient way of processing very large files containing millions of rows.

Even RFC4180-compliant CSVs can be incredibly memory-inefficient to parse. If you encounter a quoted field, you must continue to the next unescaped quote to discover how large the field is, since all newlines you encounter are part of the field contents. Field sizes (and therefore row sizes) are unbounded, and much harder to determine than simply looking for newlines - if you were to naively treat CSV as a "memory-efficient" format to parse, you would create a parser that would be easy to blow up with a trivial large file.