Generator For Fullwidth Characters (2011) | HN Mirror

Y	Hacker News new \| ask \| show \| jobs

	Generator For Fullwidth Characters (2011) (linkstrasse.de)
	30 points by Ashuu 4500 days ago

8 comments

vorg 4500 days ago

山卄凡丅勺口丫口凵刀㠪㠪勺千凵⻌⻌一山工勺丅卄⻌凡丅工刀匚卄凡尺凡匚丅㠪尺丂千口尺山卄㠪刀丫口凵匚凡刀丁凵丂丅凵丂㠪凡厶㠪尺丫丂爪凡⻌⻌丂凵乃丂㠪丅口千丅卄㠪７５，０００千凵⻌⻌一山工勺丅卄凵刀工卄凡刀口刀㠪丂丅卄凡丅爪凡长㠪凵尸口厶㠪尺丅山口一丅卄工尺勺丂口千凡⻌⻌丅卄㠪凡丂丂工呂刀㠪勺匚卄凡尺凡匚丅㠪尺丂工刀丅卄㠪彑凵工尺长丫凵刀工匚口勺㠪己口口？

Gigablah 4500 days ago

Someone is going to make a font out of this and it'll be all your fault.

ggreer 4500 days ago

This post is from 2011. I can't remember any result in the past three years that used this trick, which makes me suspect that Google has fixed the issue. Still, it'd be nice to know more. I haven't found any posts on this topic besides the one already linked to.

jrabone 4500 days ago

Google might have fixed it but Facebook haven't - the sidebar on the desktop site is full off bottom-feeder adverts for weight loss, hot dates etc. using non ASCII lookalike glyphs (usually accents or composing diacritics). I wonder if they are working around some Facebook ban on certain ads?

Spammers are still using this trick in email (usually subject lines) - I actually started writing a decomposer / normaliser plugin for SpamAssassin, then realised it was cheaper to just penalise Unicode-encoded subjects. This is why we can't have nice things.

adamlj 4500 days ago

I'm not sure Google is actively punishing this strategy but the text doesn't look good at all in the SERP. The title of the page is displayed as "Ｕｎｉｃｏｄｅ－Ｆｕｌｌｗｉｄｔｈ－Ｚｅｉｃｈｅｎ" [0]

[0] https://google.com/search?q=site:linkstrasse.de

vog 4500 days ago

> which makes me suspect that Google has fixed the issue

It would be interesting to know if Google is actively punishing such sites (low page rank, or not showing those at all), as it does with many other nasty SEO tricks.

andmarios 4500 days ago

I think these sites are punishing themselves. Using this trick would repulse more visitors than it would attract.

Jamie452 4500 days ago

I'm confused how this trick can benefit SEO?

chrisfarms 4500 days ago

It wouldn't have done anything for ranking ... but I guess the idea was to increase click's the same way people used to love a good...

    `·.¸¸.·´´¯`··._.· My Homepage `·.¸¸.·´´¯`··._.·

..type <title> banner to get attension.

Gigablah 4500 days ago

Notice that if you search for the word "Unicode" on this page, Chrome highlights it with no problem. I presume it's the same for other modern browsers.

dewiz 4500 days ago

Firefox 28 for Win: characters are not rendered fixed size spaced and words are not recognized. IE 11: well supported, renders as fixed size and words can be found with the search function as in Chrome.

I'm a bit surprised :) tbh I was quite expecting the opposite.

Svip 4500 days ago

Not Firefox, however. I assume it treats it as special characters, despite their appearance are clearly similar to the regular Latin characters they represent.

I assume Mozilla has a reason for this choice.

erichurkman 4500 days ago

This stuff is pretty nice for tabular data. Now, though, you can just use CSS.

  td.tabular {
    -moz-font-feature-settings: "tnum";
    -webkit-font-feature-settings: "tnum";
    font-feature-settings: "tnum";
  }

alexdowad 4500 days ago

I have zero interest in SEO, but as a developer, I am interested in how to write text-search and text-matching functions which treat "ordinary" and full-width Latin text consistently. Does anyone know how to do this?

taejo 4500 days ago

The unicode compatibility mappings (NFKC and NFKD) turn fullwidth Latin characters into ordinary Latin characters. [Wikipedia](https://en.wikipedia.org/wiki/Unicode_equivalence) says:

> In order to compare or search Unicode strings, software can use either composed or decomposed forms; this choice does not matter as long as it is the same for all strings involved in a search, comparison, etc. On the other hand, the choice of equivalence criteria can affect search results. For instance some typographic ligatures like U+FB03 (ﬃ), roman numerals like U+2168 (Ⅸ) and even subscripts and superscripts, e.g. U+2075 (⁵) have their own Unicode code points. Canonical normalization (NF) does not affect any of these, but compatibility normalization (NFK) will decompose the ffi ligature into the constituent letters, so a search for U+0066 (f) as substring would succeed in an NFKC normalization of U+FB03 but not in NFC normalization of U+FB03. Likewise when searching for the Latin letter I (U+0049) in the precomposed Roman Numeral Ⅸ (U+2168). Similarly the superscript "⁵" (U+2075) is transformed to "5" (U+0035) by compatibility mapping.

Any good Unicode library should support normalization. For example in python:

   >>> import unicodedata
   >>> unicodedata.normalize('NFKD', u'ｆｕｌｌｗｉｄｔｈ－ｃｏｎｖｅｒｔｅｒ')
   u'fullwidth-converter'

Jamie452 4500 days ago

What amazed me the most was that the text works fine in the browser address bar!

I'm guessing we're going to see a torrent of HN posts using this trick to get more exposure in their titles!

Ｊａｍｉｅ

dewiz 4500 days ago

..which will lead the admins to introduce a global char replace function to keep the site clean :)

badman_ting 4500 days ago

Ha, I've been using this page for years to write dumb stuff on twitter.

xxxmadraxxx 4500 days ago

Interesting. But, once for demonstration purposes is enough. Please can we not have every submitted headline on HN avail of this trick from now on. It's bad enough having to watch the evolution of headlines along the lines of "Meteorite Seen in Background of Sky-Diving Photo" into "OMG! Flaming Fireball Almost Decapitates Parachutist!" by click-junkie contributers —without needing to buy a wider monitor, just so I can fit the damned headlines onto my screen.