| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by gnrlist 3397 days ago

Sounds like you're confusing the integer code point and the integer representation of characters.

Many programming languages internally represent chars as UTF-8 or UTF-16, so when using libraries to read bytes into chars everything get's mangled.

Check out this guide for more in-depth look at the mangling that can happen. http://cweb.github.io/unicode-security-guide/background/