Hacker News new | ask | show | jobs
Generator For Fullwidth Characters (2011) (linkstrasse.de)
30 points by Ashuu 4453 days ago
8 comments

山卄凡丅 勺口 丫口凵 刀㠪㠪勺 千凵⻌⻌一山工勺丅卄 ⻌凡丅工刀 匚卄凡尺凡匚丅㠪尺丂 千口尺 山卄㠪刀 丫口凵 匚凡刀 丁凵丂丅 凵丂㠪 凡 厶㠪尺丫 丂爪凡⻌⻌ 丂凵乃丂㠪丅 口千 丅卄㠪 75,000 千凵⻌⻌一山工勺丅卄 凵刀工卄凡刀 口刀㠪丂 丅卄凡丅 爪凡长㠪 凵尸 口厶㠪尺 丅山口一丅卄工尺勺丂 口千 凡⻌⻌ 丅卄㠪 凡丂丂工呂刀㠪勺 匚卄凡尺凡匚丅㠪尺丂 工刀 丅卄㠪 彑凵工尺长丫 凵刀工匚口勺㠪 己口口?
Someone is going to make a font out of this and it'll be all your fault.
This post is from 2011. I can't remember any result in the past three years that used this trick, which makes me suspect that Google has fixed the issue. Still, it'd be nice to know more. I haven't found any posts on this topic besides the one already linked to.
Google might have fixed it but Facebook haven't - the sidebar on the desktop site is full off bottom-feeder adverts for weight loss, hot dates etc. using non ASCII lookalike glyphs (usually accents or composing diacritics). I wonder if they are working around some Facebook ban on certain ads?

Spammers are still using this trick in email (usually subject lines) - I actually started writing a decomposer / normaliser plugin for SpamAssassin, then realised it was cheaper to just penalise Unicode-encoded subjects. This is why we can't have nice things.

I'm not sure Google is actively punishing this strategy but the text doesn't look good at all in the SERP. The title of the page is displayed as "Unicode - Fullwidth - Zeichen" [0]

[0] https://google.com/search?q=site:linkstrasse.de

> which makes me suspect that Google has fixed the issue

It would be interesting to know if Google is actively punishing such sites (low page rank, or not showing those at all), as it does with many other nasty SEO tricks.

I think these sites are punishing themselves. Using this trick would repulse more visitors than it would attract.
I'm confused how this trick can benefit SEO?
It wouldn't have done anything for ranking ... but I guess the idea was to increase click's the same way people used to love a good...

    `·.¸¸.·´´¯`··._.· My Homepage `·.¸¸.·´´¯`··._.· 
..type <title> banner to get attension.
Notice that if you search for the word "Unicode" on this page, Chrome highlights it with no problem. I presume it's the same for other modern browsers.
Firefox 28 for Win: characters are not rendered fixed size spaced and words are not recognized. IE 11: well supported, renders as fixed size and words can be found with the search function as in Chrome.

I'm a bit surprised :) tbh I was quite expecting the opposite.

Not Firefox, however. I assume it treats it as special characters, despite their appearance are clearly similar to the regular Latin characters they represent.

I assume Mozilla has a reason for this choice.

This stuff is pretty nice for tabular data. Now, though, you can just use CSS.

  td.tabular {
    -moz-font-feature-settings: "tnum";
    -webkit-font-feature-settings: "tnum";
    font-feature-settings: "tnum";
  }
I have zero interest in SEO, but as a developer, I am interested in how to write text-search and text-matching functions which treat "ordinary" and full-width Latin text consistently. Does anyone know how to do this?
The unicode compatibility mappings (NFKC and NFKD) turn fullwidth Latin characters into ordinary Latin characters. [Wikipedia](https://en.wikipedia.org/wiki/Unicode_equivalence) says:

> In order to compare or search Unicode strings, software can use either composed or decomposed forms; this choice does not matter as long as it is the same for all strings involved in a search, comparison, etc. On the other hand, the choice of equivalence criteria can affect search results. For instance some typographic ligatures like U+FB03 (ffi), roman numerals like U+2168 (Ⅸ) and even subscripts and superscripts, e.g. U+2075 (⁵) have their own Unicode code points. Canonical normalization (NF) does not affect any of these, but compatibility normalization (NFK) will decompose the ffi ligature into the constituent letters, so a search for U+0066 (f) as substring would succeed in an NFKC normalization of U+FB03 but not in NFC normalization of U+FB03. Likewise when searching for the Latin letter I (U+0049) in the precomposed Roman Numeral Ⅸ (U+2168). Similarly the superscript "⁵" (U+2075) is transformed to "5" (U+0035) by compatibility mapping.

Any good Unicode library should support normalization. For example in python:

   >>> import unicodedata
   >>> unicodedata.normalize('NFKD', u'fullwidth-converter')
   u'fullwidth-converter'
What amazed me the most was that the text works fine in the browser address bar!

I'm guessing we're going to see a torrent of HN posts using this trick to get more exposure in their titles!

Jamie

..which will lead the admins to introduce a global char replace function to keep the site clean :)
Ha, I've been using this page for years to write dumb stuff on twitter.
Interesting. But, once for demonstration purposes is enough. Please can we not have every submitted headline on HN avail of this trick from now on. It's bad enough having to watch the evolution of headlines along the lines of "Meteorite Seen in Background of Sky-Diving Photo" into "OMG! Flaming Fireball Almost Decapitates Parachutist!" by click-junkie contributers —without needing to buy a wider monitor, just so I can fit the damned headlines onto my screen.