Hacker News new | ask | show | jobs
by aboutruby 2631 days ago
> If your app or website uses a Unicode character which isn't supported on a device, the user will usually see � - a replacement character. If you include Unifont, they'll see the correct character.

Neat idea. I think the transition to UTF-8 is practically done, I'm not seeing � anymore these days (used to be extremely common a while back).

2 comments

This line is largely wrong.

Most systems, when called to display a character which they're unable to render, will render a placeholder. This is most often a dotted box of some sort, roughly the size of a large character. In some systems the dotted box (assuming it's large enough for them to be readable) contains the Unicode codepoint number that the system couldn't render. In a few the box contains some representative symbol that gives you a hint what sort of thing is missing, e.g. maybe it's a Han glyph to suggest that you should look for a Chinese font.

I haven't seen any (they may exist of course) where they render U+FFFD the replacement character �.

The most common reason to see U+FFFD is the reason it was created, something was encoded or decoded in a way that is gibberish and the best option in that case is to replace the minimum chunk of gibberish with U+FFFD and then keep trying. On the Web you'd often see pages which claimed to be UTF-8 but were actually ISO-8859-1 or Windows codepage 1252, neither of which is UTF-8 but they share the most common Latin characters, these days most browsers will auto-detect this goof, and besides most web pages really are UTF-8, but when browsers were less good at guessing and more pages were wrong you'd see it more often.

Yup, I screwed up with that title! See the discussion at https://twitter.com/FakeUnicode/status/1113774985116434433
Eliminating the replacement character is a function of glyph coverage in fonts, not of UTF-8 use.
UTF-8 helped with characters like “ ”. Back before it was enabled, all these sites pasted from MS Word didn't work well.