Hacker News new | ask | show | jobs
by wongarsu 3583 days ago
How would a CSV parser even break UTF8 encoding (by accident)? All CSV control characters (comma, doubleqote and newline) map to the same codepoints in ASCII and UTF8, and no non-ASCII UTF8 character uses any ASCII codepoint in it's encoding.
2 comments

I've seen one break because of the byte-order marker that sometimes gets added to UTF-8. I don't remember the details of why that broke it, just remember that it worked fine on everything except that.
UTF-8 doesn't have a "byte order", so I thought it wouldn't have a byte-order mark. But apparently some software adds one anyway. https://en.wikipedia.org/wiki/UTF-8#Byte_order_mark
Go lets you use any code point as the delimiter.