This post is from 2011. I can't remember any result in the past three years that used this trick, which makes me suspect that Google has fixed the issue. Still, it'd be nice to know more. I haven't found any posts on this topic besides the one already linked to.
Google might have fixed it but Facebook haven't - the sidebar on the desktop site is full off bottom-feeder adverts for weight loss, hot dates etc. using non ASCII lookalike glyphs (usually accents or composing diacritics). I wonder if they are working around some Facebook ban on certain ads?
Spammers are still using this trick in email (usually subject lines) - I actually started writing a decomposer / normaliser plugin for SpamAssassin, then realised it was cheaper to just penalise Unicode-encoded subjects. This is why we can't have nice things.
I'm not sure Google is actively punishing this strategy but the text doesn't look good at all in the SERP. The title of the page is displayed as "Unicode - Fullwidth - Zeichen" [0]
> which makes me suspect that Google has fixed the issue
It would be interesting to know if Google is actively punishing such sites (low page rank, or not showing those at all), as it does with many other nasty SEO tricks.
Notice that if you search for the word "Unicode" on this page, Chrome highlights it with no problem. I presume it's the same for other modern browsers.
Firefox 28 for Win: characters are not rendered fixed size spaced and words are not recognized.
IE 11: well supported, renders as fixed size and words can be found with the search function as in Chrome.
I'm a bit surprised :) tbh I was quite expecting the opposite.
Not Firefox, however. I assume it treats it as special characters, despite their appearance are clearly similar to the regular Latin characters they represent.
I have zero interest in SEO, but as a developer, I am interested in how to write text-search and text-matching functions which treat "ordinary" and full-width Latin text consistently. Does anyone know how to do this?
> In order to compare or search Unicode strings, software can use either composed or decomposed forms; this choice does not matter as long as it is the same for all strings involved in a search, comparison, etc. On the other hand, the choice of equivalence criteria can affect search results. For instance some typographic ligatures like U+FB03 (ffi), roman numerals like U+2168 (Ⅸ) and even subscripts and superscripts, e.g. U+2075 (⁵) have their own Unicode code points. Canonical normalization (NF) does not affect any of these, but compatibility normalization (NFK) will decompose the ffi ligature into the constituent letters, so a search for U+0066 (f) as substring would succeed in an NFKC normalization of U+FB03 but not in NFC normalization of U+FB03. Likewise when searching for the Latin letter I (U+0049) in the precomposed Roman Numeral Ⅸ (U+2168). Similarly the superscript "⁵" (U+2075) is transformed to "5" (U+0035) by compatibility mapping.
Any good Unicode library should support normalization. For example in python:
Interesting. But, once for demonstration purposes is enough. Please can we not have every submitted headline on HN avail of this trick from now on. It's bad enough having to watch the evolution of headlines along the lines of "Meteorite Seen in Background of Sky-Diving Photo" into "OMG! Flaming Fireball Almost Decapitates Parachutist!" by click-junkie contributers —without needing to buy a wider monitor, just so I can fit the damned headlines onto my screen.