Hacker News new | ask | show | jobs
by throwaway2048 2818 days ago
This advice is very dangerous, both shift JIS and utf-16 (some of the most common non UTF-8 encodings) can both have things that are are 0-127 ASCII codepoints and things that look like 0-127 ASCII but are in the second part of a multi-byte sequence, and do not represent ASCII equivalent characters at all.
1 comments

Note that they said "ASCII-compatible encoding". You're right to note the problem with shift-JIS and others, but then, those aren't ASCII-compatible. Whereas utf8 and the iso-8859 series are all ASCII-compatible in that if it looks like an ASCII character it is.
The point is, certain text, especially shift-JIS and the various EUC encodings can look exactly like an 8 bit "extended ASCII" when its in fact a variable width 8-16bit encoding.

Its bad advice that leads to corruption.

If you already know the encoding, then OP's advice is useless, if you don't but suspect its an 8 bit extended ASCII encoding, it might not be, because the aforementioned look exactly like an 8bit encoding.