| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by nickpeterson 1648 days ago
	It’s funny, csv files are so common and yet many mainstream languages don’t even attempt a decent parser baked in. I think dotnet has 3-4 different ones and as I recall they’re all pretty slow.

1 comments

jwandborg 1648 days ago

There's multiple dialects of CSV. Besides the more standardish dialect there are some weird ones that prevent some types of optimization. I remember Apple's "Enterprise Partner Feed" had a dialect I've never seen elsewhere so far. Columns were separated by 0x01, rows were separated by 0x02 0x0A.

The row separator being two bytes throws a wrench in most parsers.

link

ipdashc 1648 days ago

What a bizarre choice. If they're going to commit to weird ASCII control chars you'd think they could just use 0x1C to 0x1F, which are explicitly intended as delimiters/Separators... sigh. (I've always wondered why more people don't use the various Separators, but I admit human-readability is a big advantage)

link

mschuster91 1648 days ago

> The row separator being two bytes throws a wrench in most parsers.

Huh? Anything that ingests Windows-origin files needs to be capable with \r\n by default.

link