Hacker News new | ask | show | jobs
by tomjen3 4336 days ago
I am a Dane. We use the latin alphabet plus æ, ø and å. Some names, like Søren or Åse, can't be written in pure ascii, but the people who have these names tend to just have addresses like soren@whatever.dk or soeren@whatever.dk. It seems preposterous for me to risk breaking the web over something so relatively trivial.

Heck my name is ascii compatible but it isn't available.

3 comments

A lot of cases of "accented characters" are much simpler, compared to e.g. Chinese or Japanese, where the mapping is not 1-1 (a given kanji could have multiple readings, or for Chinese, pinyin is not 1-1). There's also a number of people who don't know the ASCII mapping for their language. Chinese can be written with bopomofo or 5-stroke input, for example. There are programs for input of indic languages that use visual keyboards.

Let's say the Internet had been invented in Japan instead of the US. How would you feel if people told you that you had to write your name in katakana everywhere? As another commenter mentioned, internationalization is here to stay, and if we want to expand to the next few billion users it's even more important. FWIW internationalized usernames are already available on a number of non-email platforms (Weibo as a prime example). For email to remain competitive, it's important to keep up in the internationalization space.

Well, internationalized usernames are "available" on weibo in the sense that your displayed "name" can be anything you want. But you don't log in with your displayed name; it's an arbitrary bit of account data, and is changeable whenever you want. You log in with an email address, which is how the system identifies you.

(checking now just to make sure, I see that weibo allows three options for logging in: an email address (not internationalized), an account number (not internationalized), and a phone number (not internationalized))

I admit I don't understand the downvote. Email already has internationalization in the same sense as weibo does. You might receive email from me as 'From: Michael Watts <i.made.this.up@hotmail.com>'; the email address doesn't support arbitrary characters, but the name does (I've received email from '"=?gb18030?B?w8DIy7nY?=" <XXXXXXXXX@qq.com>', which worked out to a displayed name of 美人关). Similarly, if I wanted to display 美人关 as my handle on weibo, I could do that, but I wouldn't be able to use it to identify my account.
The web is currently broken for not supporting stuff like that. I imagine over half of the world population can't write their name in ASCII only characters, that's pretty inexcusable. Internationalization is here to stay, it's our job as software engineers to support it everywhere.
Your argument would be a lot more useful and falsifiable if you put it in cost-benefit terms. I'm not so convinced that having a few billion people use ASCII approximations is "inexcusable", but at least if you said "the worldwide benefits are worth the implementation costs" that could be wrong or right.
His comment is plenty useful. We've had computers around long enough that one doesn't need to provide a cost-benefit analysis to justify saying "this is stupid". "Check this out, I've got a box that can play movies. It can immerse you in a 3D video environment that you can interact with. You can talk to people thousands of miles away for free. It allows you access to much of the world's knowledge. It can solve numeric problems that would take years to solve by hand."

<a large portion of the world's population responds> "Yeah, that's neat and all. How come when I type my name all I see on the screen are squares?"

Perhaps it doesn't qualify for the "inexcusable" tag, but it sure seems pretty broken. "It's always been that way" doesn't strike me as a very good resolution reason for the bug.

It's not trivial. This is 2014, not 1960. If computers should do one thing correctly, it should be to display text. As it happens, only a small subset of languages can be displayed correctly in email addresses. It's completely and utterly ridiculous.

Sure, it's understandable as email is an old and a widely used protocol, so changes are difficult to push through. But it's not acceptable, and it's not "relatively trivial".

There's no risk of "breaking the web", since the web is already broken for billions of people.

If email can't be made to support UTF-8, then email should be replaced altogether.

The major problems are not really technical in nature. Homographs, Unicode madness (normalization etc..), and the biggest problem of all: input methods. It would be highly ironical if "international" "more global" email would lead to more nationalized islands because only local people can input those "international" email addresses. A single global character set (in the non-technical sense) is required for a system to be really global. You might argue that ASCII being that global set is eurocentric, but there isn't really any good alternatives available. Afaik pretty much every computer can input ASCII with relative ease no matter how exotic their users native script is.