Hacker News new | ask | show | jobs
by arp242 693 days ago
> Many tools are getting it wrong.

They're not getting it wrong, they're just assuming a different variant.

There is no "standard" for CSV. Yes, there's an RFC, published in 2005, about 30 years after everyone was already using CSV. That's too late. You can't expect people to drop all compatibility just because someone published some document somewhere. RFC 4180 explicitly says that "it does not specify an Internet standard of any kind", although many people do take it as a "standard". But even if it did call itself a standard: it's still just some document someone published somewhere.

They should have just created a new "Comma Separated Data" (file.csd) standard or something instead of trying to retroactively redefine something that already exists. Then applications could add that as a new option, rather than "CSV, but different from what we already support". That was always going to be an uphill battle.

Never mind that RFC 4180 is just insufficient by not specifying character encodings in the file itself, as well as some other things such as delimiters. If someone were to write a decent standard and market it a bit, then I could totally see this taking off, just as TOML "standardized INI files" took off.

3 comments

RFC 4180 says it "documents the format that seems to be followed by most implementations" and in practice I find that to be true, though my CSVs don't interact with a lot of very old software. You get very far by treating "RFC 4180, UTF-8" as a standard and considering every implementation that doesn't follow it to be broken. I'm not sure I have ever seen software that simultaneousy doesn't follow the RFC, but does consistently support escaping.
Did TOML take off? As much as I love it, it seems really rare to see in the wild. I still see YAML everywhere and despair.
It's in the standard library for Python, Rust, Julia, and maybe some other languages. It's also widely used in those ecosystems (pyproject.toml, cargo.toml). I think it's fair to say it took off, even though YAML is also popular.
The tomllib library in Python 3.11+ can only read TOML files, not write them.
I don't believe its in the standard library for Rust, even if it is very popular in the Rust ecosystem.
Right; I'm not super-familiar with Rust and how exactly they organise things, but it's in more or less every Rust project due to Cargo.toml.
Rust uses it, and Rust seems pretty popular.

I know Alire, the Ada crate manager uses it too.

I use it for some personal projects. It's really nice!

Which is hilarious when you consider that the spec is that complex.
Toml is both great and terrible. I'm not a fan of how it handles some deeper arrays
> someone were to write a decent standard and market it a bit, then I could totally see this taking off, just as TOML "standardized INI files" took off.

Why? We have xlsx for the office crowd and arrow for the HPC crowd. In no universe does anyone actually have to invent another tabular data format using delimiters.

Neither are a universal replacements for CSV. They're not even text formats (well, technically xlsx is if you expect the XML from the zip, but practically: no really.). The article already explains why, as the title says, "CSV is still king": it's widely used, it's simple, it's used all over the place, it's universal, it's human-readable-y.