| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jillesvangurp 683 days ago

Most major languages have decent libraries, frameworks and tools for dealing with CSV. Those tend to have lots of tests for all the well known issues and edge cases. Especially in the python world, which is used for a lot of data processing, tooling is not really an issue. But most other languages also have decent frameworks. Most of that stuff covers the few standards that exist for this, the well known variants of the format that are out there (quite a few) and can deal with the quirks of those.

The only time people get in trouble with CSV is when they skip using those tools, hack something together, and then get it wrong.

> The new format would ideally have types, the files would be sharded and have metadata to quickly scan them

There's no need for new stuff. It would be redundant as there are several things already that do these things. Adding more isn't helpful. The problem is most of the stuff that supports CSV tends to support none of those things and fixing a lot of ancient systems to retrofit them with e.g. parquet support or whatever is a mission impossible. CSVs principle feature is that it is simply everywhere. That's hard to replicate. People have been trying for decades.