Hacker News new | ask | show | jobs
by ninkendo 3213 days ago
Agreed, assuming O(1) lookup of anything inside a string only leads to bad encoding bugs. UTF-8 everywhere, no exceptions.

You can never assume any user-visible character will align evenly with any byte boundary, even if you're using UTF-32. Composed characters throw that assumption out the window, as well as dozens of other unicode quirks I can't recall now.