Hacker News new | ask | show | jobs
by barbegal 2178 days ago
The 802.11 standards have always allowed up to 32 bytes which can be filled with any data, it does not have to be in a particular encoding. In 802.11-2012 there is a separate tag SSIDEncoding which can be used to specify if these bytes are in UTF-8 or "unspecified". If the UTF-8 option is set, the SSID should be interpreted as UTF-8.

It is not clear in this case if the router sets this flag or not. Either way there is no stipulation in the spec about how the UTF-8 characters should be displayed so many of these options are potentially valid.

2 comments

The bytestring was truncated after 32 bytes, in the middle of a UTF-8 byte sequence. This means the resulting truncated string is not valid UTF-8 anymore. So my guess is that most devices decide "if it's not valid UTF-8, it must $LEGACY_ENCODING".
Unicode offers two ways forward when you can't decode what you have, one alternative is an exception, you just fail because you weren't able to decode something.

The other is for any code unit that won't decode you emit U+FFFD the Unicode Replacement Character and then you carry on decoding.

For humans U+FFFD makes it obvious something is wrong, it's typically visualised as a black diamond with a white question mark. And for a machine it shouldn't match parsing rules, it isn't an alphanumeric, it isn't any of the common separator or spacing characters, so it's unlikely to be of use in an attack.

That is a reasonable approach if you know that what you are decoding is supposed to be UTF-8.

If you don't know the text encoding because there is no information to indicate it (or you don't trust that information to be correct) then you will have to guess and "decode as UTF-8 for valid UTF-8, use some legacy encoding otherwise" is a common approach (used e.g. by many text editors).

I cannot believe I did not notice that. I will rerun all of my testing with a valid UTF-8 byte sequence :)
Huh, I'm surprised emojis aren't more popular for SSIDs... can't wait until this knowledge spreads more and we'd have a vomit of color when we open the "Wireless Networks" menu.

OTOH for most people the SSID is "Linksys 4FBD" or similar...

> OTOH for most people the SSID is "Linksys 4FBD" or similar...

And to think that one of the major reasons behind having random strings after <Vendor name> (Apart from non-technical people in apartment blocks being super confused), is so that you can't go around rainbow tables that work for large swathes of the routers you would encounter.

> can't wait until this knowledge spreads more and we'd have a vomit of color when we open the "Wireless Networks" menu.

You're limited to 32 bytes, which limits the spew somewhat. Some emoji are up to 4 bytes long, so you can in theory get a sequence of 8 of them in a row if you want. Should encourage a little bit of creativity to fit within those lines...

I don't even want to know if any system would process things like bell characters or Right to Left special character...

Unrelated note: Had to file a bug last month because OpenWrt's web interface kept accepting more than that and stopping wireless from coming back up when you tried. Javascript length checks are weird.

I work on some ecommerce sites. I've had to cancel orders because the order exports can't handle emoji in the fields. I can't wait until baby names have actual emoji in them. I bet some idiot has already tried it.
Obligatory XKCD: https://xkcd.com/327/