Hacker News new | ask | show | jobs
by jwandborg 1648 days ago
There's multiple dialects of CSV. Besides the more standardish dialect there are some weird ones that prevent some types of optimization. I remember Apple's "Enterprise Partner Feed" had a dialect I've never seen elsewhere so far. Columns were separated by 0x01, rows were separated by 0x02 0x0A.

The row separator being two bytes throws a wrench in most parsers.

2 comments

What a bizarre choice. If they're going to commit to weird ASCII control chars you'd think they could just use 0x1C to 0x1F, which are explicitly intended as delimiters/Separators... sigh. (I've always wondered why more people don't use the various Separators, but I admit human-readability is a big advantage)
> The row separator being two bytes throws a wrench in most parsers.

Huh? Anything that ingests Windows-origin files needs to be capable with \r\n by default.