| This is a really good post that shines some light on how the insanity of encodings still isn't fixed today, since so many operating systems still don't completely use Unicode everywhere. Some of the reasonings behind why the characters are displayed like that are slightly incorrect, though, so here are some corrections: I'm going to supply each example here with some python3 code to reproduce with, with the following definition: `data = b"a\xcc\xb6\xcc\x81\xcc\x93\xcc\xbf\xcc\x88\xcc\x9b\xcc\x9b\xcd\x90\xcd\x98\xcd\x86\xcc\x90\xcd\x9d\xcc\x87\xcc\x92\xcc\x91\xcd"` First, let's start at the beginning: > My router just cut the name down to 32 octets though to stay complient
> This was what was being sent according to iw
> `a\xcc\xb6\xcc\x81\xcc\x93\xcc\xbf\xcc\x88\xcc\x9b\xcc\x9b\xcd\x90\xcd\x98\xcd\x86\xcc\x90\xcd\x9d\xcc\x87\xcc\x92\xcc\x91\xcd` If you look at this closely, the last byte in this sequence is `\xcd`, which is an incomplete UTF-8 character. It's missing the final `\x84` that the router cut off (along with the three additional `a` characters). > with the raw hex being
> `97ccb6cc81cc93ccbfcc88cc9bcc9bcd90cd98cd86cc90cd9dcc87cc92cc91cd` small mistake: the hex of `a` is `61`, not `97` (that's decimal), but otherwise correct. > Galaxy S8 running Android 9 with Kernel 4.4.153
> Amazon Firestick Everything correct, except for a small detail: These two devices render the result of UTF-8 decoding while ignoring bytes that are invalid unicode (in python3: `data.decode('utf-8', 'ignore')`) > iPhone 6 running iOS 13.5.1
> Apple TV Second Generation Completely correct. This is definitely Mac OS Roman (in python3: `data.decode('mac_roman')`) > Windows 10 Pro 10.0.19041 This one is a incorrect again: Windows is interpreting the characters in the "Windows Codepage 1252" (also known as "Western") encoding and ignoring invalid characters (in python3: `data.decode('cp1252', 'ignore')`) Decoding every character separately as UTF-8 would fail (since every byte that can be a continuation of a UTF-8 character is not a valid start byte). Interpreting every character as a Unicode code-point number would give something very similar, but not exactly the same: What Windows decodes as quote, caret-y thing, angle bracket-y thing, tilde, dagger, double dagger, and single quote fall into a control character block at the start of the Unicode "Latin-1 Supplement" block (`\x80` to `\x9f`). > Chromebook running ChromeOS 83.0.4103.97 Correct. The Chromebook seems to have rendered the ASCII a, but replaced all other 31 characters with question marks. > Kindle Paperwhite running Firmware 5.10.2
> Vizio M55-C2 TV Also correct. Those two devices seem to opt to display hex instead of falling back to question marks as the Chromebook does. I hope this comment gave some useful insight into why these devices decoded it this way :) |