Hacker News new | ask | show | jobs
by bklimt 4389 days ago
"In general, a character can be represented in 1 byte or 2 bytes. Let's say 1-byte character is ANSI character - all English characters are represented through this encoding. And let's say a 2-byte character is Unicode, which can represent ALL languages in the world."

No. A character can be three or four bytes. I think he meant ASCII, not ANSI. And no, two byte characters are not "Unicode". I feel like this article might do a disservice to folks who aren't totally clear about Unicode before theyread it. I would strongly recommend reading Joel Spolsky's "The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)" and being totally clear on that before trying to read this.