|
|
|
|
|
by fuhrysteve
1763 days ago
|
|
Yeah, this is pretty much it. The author complains about CSVs being "notoriously inconsistent" as though switching to some other format would magically change that. They're only inconsistent because sometimes lazy programmers do ",".join(mylist) instead of using an RFC4180 compliant CSV writer. Lazy programmers will just use non-compliant methods of creating whatever magic format OP is dreaming about. Case in point: trailing commas in JSON objects, and other ridiculous things that people have come up with such as encoding a date in JSON like this: "\/Date(628318530718)\/" https://docs.microsoft.com/en-us/previous-versions/dotnet/ar... CSVs also are great because you can parse them one row at a time. This makes for a very scale-able and memory-efficient way of processing very large files containing millions of rows. Let there be no mistake: Everyone reading this today will retire long before CSVs retire. And that's just fine by me. |
|
Even RFC4180-compliant CSVs can be incredibly memory-inefficient to parse. If you encounter a quoted field, you must continue to the next unescaped quote to discover how large the field is, since all newlines you encounter are part of the field contents. Field sizes (and therefore row sizes) are unbounded, and much harder to determine than simply looking for newlines - if you were to naively treat CSV as a "memory-efficient" format to parse, you would create a parser that would be easy to blow up with a trivial large file.