Hacker News new | ask | show | jobs
by majkinetor 1287 days ago
With gzip on web server the difference is not important at all.

CSV in general is problematic as there is no standard (RFC 4180 is not). In certain contexts this surely can be good solution but definitelly not good in general scenario.

2 comments

As Wikipedia puts it, "CSV is widely used to refer to a large family of formats that differ in many ways". If there's a canonical standard, it appears to be RFC4180: https://www.rfc-editor.org/rfc/rfc4180
It appears, but its not. I have not found single program so far that conforms only to this RFC and nothing else.

From the RFC itself:

   Status of This Memo

   This memo provides information for the Internet community.  It does
   not specify an Internet standard of any kind.  Distribution of this
   memo is unlimited.
> I have not found single program so far that conforms only to this RFC and nothing else.

Wouldn't that be impossible, given that parsers have to accept all kind of bizarro CSV flavors? Maybe more importantly, do you know of a single program or single CSV library that doesn't support reading or writing CSV as defined by the RFC?

Yeah, any of them. Just add new line in the "cell" and then go jump from the bridge.
An "Internet Standard" is just a designation that has been given to an RFC that has been blessed in a certain way. See https://www.rfc-editor.org/ for more details, but the set of designations is:

    * Uncategorised
    * Historic
    * Experimental
    * Informational
    * Best Current Practice
    * Proposed Standard
    * Draft Standard
    * Internet Standard
Once an RFC reaches "Internet Standard" it is given a special designation, e.g. STD-63 is the standards designation for RFC-3629: UTF-8 < https://www.rfc-editor.org/info/std63 >. See https://www.rfc-editor.org/standards

Being an "Internet Standard" is kinda special, but not especially so. For example, IMAP4, originally specified in RFC-3501 in March 2003, updated many times since, and revised in RFC-9051 in August 2021, is still a "Proposed Standard" without an STD designation, nearly 20 years and dozens of interoperable implementations later.

"Rough consensus and running code" is how things get done.

RFC-4180 is plenty good enough a "standard" for people to decide to interoperate over. They just have to decide to do so.

(Note also that HTML5 is not an "Internet Standard" according to the IETF et al. The last version to get an RFC was HTML 2 in RFC-1866, designated "Historic". And interoperability was an issue for a while with later versions of HTML during the "Best viewed in Internet Explorer/Netscape Navigator" wars. To get interoperability like we eventually did, you don't need an "Internet Standard"; you just need implementers who want to interoperate, and are willing to favour it over lock-in, and even over strict backwards-compatibility.)

(Also, the "and nothing else" clause in your comment confuses me. Why not support other formats/variants also? "Be liberal in what you accept" is certainly something that you probably want to avoid if you're designing a new format/protocol that no-one else is using yet, but if you're working with a decades-old format that was traditionally poorly-specified, with millions of documents out in the wild, it's probably the best way to allow existing users to move forward.)

That was my first thought: JSON is highly amenable to compression; due to the repetition this blog is complaining about. It's a good lesson for junior devs: if you find yourself thinking about saving bits and bytes with custom protocols, you need to pull out of the rabbit hole and find the existing solution to your problem.

Sure, for a local data file or something where it's nice to be human-readable-ish, CSV can be a better choice than JSON (assuming you use a library for all the edge cases and string escapes.) If you really want a super-small and fast serialization, that's what protobuf is for.