Hacker News new | ask | show | jobs
by dagw 4764 days ago
What's really depressing is I've had that happen when ordering from a Swedish company. I can kind of understand that US companies get it wrong, but when a company can't deal with all the letters in the alphabet of the country it's based in you know something is fucked.
1 comments

One of the issues is actually that web browsers can have inconsistent encoding of the data they send, and depending on the amount of testing (across browsers) done that can yield surprises.

For instance, the "unicode snowman" is because MSIE 5-8 will refuse to send a form as UTF-8 (completely ignoring `accept-charset`) if it can encode everything to Latin-1. Conversedly, most browsers will default to UTF-8 (but I believe normalization may vary). If the system was built in the early 00s and only tested in MSIE, it might well expect all input data as latin-1 (because that seemed to work at the time) and crap out when UTF-8 comes in.

What does "unicode snowman" have to do with this?
Some websites now will include a hidden input field in all forms <input type="hidden" name="snowman" value-"&#9731" />

to convince IE that it's supposed to be sending UTF-8, not latin1 (And so the site can recognize if the input was likely mangled.

It's built into Rails, except they use utf8=✓ now.