| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by wongarsu 3583 days ago
	How would a CSV parser even break UTF8 encoding (by accident)? All CSV control characters (comma, doubleqote and newline) map to the same codepoints in ASCII and UTF8, and no non-ASCII UTF8 character uses any ASCII codepoint in it's encoding.

2 comments

ktRolster 3583 days ago

I've seen one break because of the byte-order marker that sometimes gets added to UTF-8. I don't remember the details of why that broke it, just remember that it worked fine on everything except that.

link

sp332 3582 days ago

UTF-8 doesn't have a "byte order", so I thought it wouldn't have a byte-order mark. But apparently some software adds one anyway. https://en.wikipedia.org/wiki/UTF-8#Byte_order_mark

link

weberc2 3582 days ago

Go lets you use any code point as the delimiter.

link