Hacker News new | ask | show | jobs
by nickpeterson 1648 days ago
It’s funny, csv files are so common and yet many mainstream languages don’t even attempt a decent parser baked in. I think dotnet has 3-4 different ones and as I recall they’re all pretty slow.
1 comments

There's multiple dialects of CSV. Besides the more standardish dialect there are some weird ones that prevent some types of optimization. I remember Apple's "Enterprise Partner Feed" had a dialect I've never seen elsewhere so far. Columns were separated by 0x01, rows were separated by 0x02 0x0A.

The row separator being two bytes throws a wrench in most parsers.

What a bizarre choice. If they're going to commit to weird ASCII control chars you'd think they could just use 0x1C to 0x1F, which are explicitly intended as delimiters/Separators... sigh. (I've always wondered why more people don't use the various Separators, but I admit human-readability is a big advantage)
> The row separator being two bytes throws a wrench in most parsers.

Huh? Anything that ingests Windows-origin files needs to be capable with \r\n by default.