Hacker News new | ask | show | jobs
by gnrlist 3350 days ago
Sounds like you're confusing the integer code point and the integer representation of characters.

Many programming languages internally represent chars as UTF-8 or UTF-16, so when using libraries to read bytes into chars everything get's mangled.

Check out this guide for more in-depth look at the mangling that can happen. http://cweb.github.io/unicode-security-guide/background/