| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ninkendo 3213 days ago
	Agreed, assuming O(1) lookup of anything inside a string only leads to bad encoding bugs. UTF-8 everywhere, no exceptions. You can never assume any user-visible character will align evenly with any byte boundary, even if you're using UTF-32. Composed characters throw that assumption out the window, as well as dozens of other unicode quirks I can't recall now.