Hacker News new | ask | show | jobs
A first step toward more global email (googleblog.blogspot.com)
55 points by aseidl 4336 days ago
14 comments

What are the security implications?

Does anyone know how canonicalization is handled? Does every mail program need to know how to precompose/decompose etc? How do you protect against impersonation using look-alike letters?

This is, as far as I know, not yet a solved problem even at the domain name level[0], and it's likely to open a whole new can of worms at the account level.

[0] http://en.wikipedia.org/wiki/IDN_homograph_attack

Something like hashing or petnames, maybe.
One small fix would be to mark non-latin characters in an email address.
I have a "non-latin" letter in my surname, and I find highlighting it as somehow wrong or suspicious offensive.
This is very useful for non native english speakers that uses other than latin characters. I have to repeat word by word and check what people wrote when I need to tell my email address.

If my address was a word in hebrew, or something people can pronounce and write it would actually help, saving time and avoiding misspellings.

But this should be an alias, or it should be very easy to create one, I already have my email for some years, I don't want to create a new one and handle two accounts unnecessarily.

Also lets say someone creates his email address in his native language, so it is easier to give his address to his friends at school, later in life he wants to give his address to a VC from US he met. even if he gives a business card with his email printed in clear letters, this western person can't even spell or even type, unless he could copy/paste the address he won't be able to send him an email.

Creating a second Gmail account and setting it up to forward to your primary account is already a pretty straightforward procedure, so that probably won't be much of a problem. (There would presumably be a need for some user education on how to do that, though. And I agree that it might be nice to have a built-in "aliases" feature in Gmail, regardless of internationalization.)
Quick question; when characters appears as blocks on my screen, they can still be copy-pasted, right? I would imagine so since the required font is missing but the data, which is the important part, is still available.
The unfortunate answer is "it depends."

There's three reasons why text might appear blocked out: corrupt data (we'll ignore this), data in formats the text renderer doesn't "understand," and data in formats it DOES understand but doesn't have the prerequisite fonts to draw.

Typically when it is a lack-of-fonts issue you can copy it back out and paste it elsewhere and it will work fine as the data's consistency is kept.

But when the text renderer literally doesn't understand the underlying data (either because it is misconfigured or doesn't fully support UNICODE), when you try to copy back out you'll often get a corrupted version of the underlying data which cannot be reused (e.g. a 2 byte character treated as two 1 byte characters).

That issue is less and less common in 2014, as MOST text renderers support 2 byte UNICODE characters even if they won't have the fonts to render the full sets. But around the year 2000, it was fairly common to run across a text editor which under-the-hood was just ASCII and it would corrupt irrecoverably UNICODE inputed.

Fist of all, I am quite aware that funny remarks are frowned upon on HN and I agree to this policy. That said, I think that http://www.bash.org/?244321 is still the correct and informative answer.
That's not relevant at all to that person's question. Did you link the right thing? That is the generic password "joke."
Maybe this is a good thing but I wouldn't have an email address with accented characters (and my name contains two of them). It would be quite awkward if I gave someone my address and he/she couldn't simply type it with his/her default keyboard configuration.
I agree with this. Also, many years ago when they started allowing unicode domain names several of my Japan friends thought Japan would switch over to unicode domain names. It's now been several years and I have yet to see a single Japanese website with a Japanese character domain name. (I'm sure someone will pull one up just now ;)

The problem (one problem?) is legacy devices. At the time most phones and possibly even some browsers had limited input to URL fields in ascii only. The keyboard they'd show when you when to enter a URL didn't provide the input methods for non-ascii text.

So, having a non-ascii domain would have only lost you customers. Or made it more confusing. As in do I go to mitsubishi.com or 三菱.com? (三菱.jp is registered but googling for site:三菱.jp brings up zero hits)

I'm not saying I'm against non-ascii email names. I'm just suggesting it's not likely to change much anytime soon. Just a guess

Right, that's a legit concern. How about an app that quickly loads up all the language keyboards in the OS and lets you pop into one temporarily for the purposes of typing one thing in notepad or just straight into the clipboard?
> How about an app that quickly loads up all the language

That sounds convenient.

/s

Not all at once... just retrieves them for searching and really quick usage. launch the app, type in "m....a....n.." and hit the mandarin option. Your new Chinese friend types in his email address and you pop it into a contact. No settings need to be changed or anything. Is that really such a weird idea?
> Not all at once... just retrieves them for searching and really quick usage. launch the app, type in "m....a....n.." and hit the mandarin option

And now you have two problems. Why would you be searching for mandarin which is the anglicised-ASCII translation for Guānhuà. More correctly, you should be searching for ㄍㄨㄢ ㄏㄨㄚ.

One would assume that if the OS is set to English, the keyboard options will be listed in English, and if the OS is set to ㄍㄨㄢ ㄏㄨㄚ, the keyboard options will be listed in ㄍㄨㄢ ㄏㄨㄚ. This isn't a problem at all if you know the name in your own language for the language you are seeking.
QR codes to the rescue!
Doesn't help if someone is dictating it to me.
Um, there's a filled dot. Then an empty one. Then there's like three filled ones. No, four. Then another empty one...
For people who are wondering, the example Japanese email address translates to "takeshi@mail.google" (Takeshi is a male, Japanese first name).
I just tried to create a gmail account with non-latin characters and received this message from Google:

    Please use only letters (a-z), numbers, and periods.
EDIT: I guess I missed this crucial sentence

    Of course, this is just a first step and there’s still a ways to go. In the future, we want to make it possible for you to use them to create Gmail accounts.
I am a Dane. We use the latin alphabet plus æ, ø and å. Some names, like Søren or Åse, can't be written in pure ascii, but the people who have these names tend to just have addresses like soren@whatever.dk or soeren@whatever.dk. It seems preposterous for me to risk breaking the web over something so relatively trivial.

Heck my name is ascii compatible but it isn't available.

A lot of cases of "accented characters" are much simpler, compared to e.g. Chinese or Japanese, where the mapping is not 1-1 (a given kanji could have multiple readings, or for Chinese, pinyin is not 1-1). There's also a number of people who don't know the ASCII mapping for their language. Chinese can be written with bopomofo or 5-stroke input, for example. There are programs for input of indic languages that use visual keyboards.

Let's say the Internet had been invented in Japan instead of the US. How would you feel if people told you that you had to write your name in katakana everywhere? As another commenter mentioned, internationalization is here to stay, and if we want to expand to the next few billion users it's even more important. FWIW internationalized usernames are already available on a number of non-email platforms (Weibo as a prime example). For email to remain competitive, it's important to keep up in the internationalization space.

Well, internationalized usernames are "available" on weibo in the sense that your displayed "name" can be anything you want. But you don't log in with your displayed name; it's an arbitrary bit of account data, and is changeable whenever you want. You log in with an email address, which is how the system identifies you.

(checking now just to make sure, I see that weibo allows three options for logging in: an email address (not internationalized), an account number (not internationalized), and a phone number (not internationalized))

I admit I don't understand the downvote. Email already has internationalization in the same sense as weibo does. You might receive email from me as 'From: Michael Watts <i.made.this.up@hotmail.com>'; the email address doesn't support arbitrary characters, but the name does (I've received email from '"=?gb18030?B?w8DIy7nY?=" <XXXXXXXXX@qq.com>', which worked out to a displayed name of 美人关). Similarly, if I wanted to display 美人关 as my handle on weibo, I could do that, but I wouldn't be able to use it to identify my account.
The web is currently broken for not supporting stuff like that. I imagine over half of the world population can't write their name in ASCII only characters, that's pretty inexcusable. Internationalization is here to stay, it's our job as software engineers to support it everywhere.
Your argument would be a lot more useful and falsifiable if you put it in cost-benefit terms. I'm not so convinced that having a few billion people use ASCII approximations is "inexcusable", but at least if you said "the worldwide benefits are worth the implementation costs" that could be wrong or right.
His comment is plenty useful. We've had computers around long enough that one doesn't need to provide a cost-benefit analysis to justify saying "this is stupid". "Check this out, I've got a box that can play movies. It can immerse you in a 3D video environment that you can interact with. You can talk to people thousands of miles away for free. It allows you access to much of the world's knowledge. It can solve numeric problems that would take years to solve by hand."

<a large portion of the world's population responds> "Yeah, that's neat and all. How come when I type my name all I see on the screen are squares?"

Perhaps it doesn't qualify for the "inexcusable" tag, but it sure seems pretty broken. "It's always been that way" doesn't strike me as a very good resolution reason for the bug.

It's not trivial. This is 2014, not 1960. If computers should do one thing correctly, it should be to display text. As it happens, only a small subset of languages can be displayed correctly in email addresses. It's completely and utterly ridiculous.

Sure, it's understandable as email is an old and a widely used protocol, so changes are difficult to push through. But it's not acceptable, and it's not "relatively trivial".

There's no risk of "breaking the web", since the web is already broken for billions of people.

If email can't be made to support UTF-8, then email should be replaced altogether.

The major problems are not really technical in nature. Homographs, Unicode madness (normalization etc..), and the biggest problem of all: input methods. It would be highly ironical if "international" "more global" email would lead to more nationalized islands because only local people can input those "international" email addresses. A single global character set (in the non-technical sense) is required for a system to be really global. You might argue that ASCII being that global set is eurocentric, but there isn't really any good alternatives available. Afaik pretty much every computer can input ASCII with relative ease no matter how exotic their users native script is.
Interesting.

Question: What do you guys think it'd take to get the other N% of email providers, clients, servers, and whatnot onboard?

Is this an ipV6-like situation?

Hardly.

Almost all email moves between only a handful of companies. Google, Microsoft, Yahoo, Facebook, Apple. Between them they dominate the landscape. It only takes a handful of engineers and product managers at these companies to decide "let's do this" and pretty quickly such email addresses can become a reality for at least person to person mail.

Of course for them to become usable for signing up to websites, mailing lists etc, will take much longer. But people may not mind having two email addresses if they can put one on their business card.

After a quick glance at the standard, I belive email providers like Google who support SMTPUTF8 could support having two email addresses transparently to the user.

According to the standard: "If the message cannot be forwarded because the next-hop system cannot accept the extension, it MUST be rejected or a non-delivery message MUST be generated and sent."

Google could preserve a backup ASCII email for all UTF-8 email addresses, which would be used in the case the recipient doesn't support SMTPUTF8.

Of course, I don't know if Google has plans to support this.

> Almost all email moves between only a handful of companies. Google, Microsoft, Yahoo, Facebook, Apple

That's pretty naive. You need to step outside your comfy bubble.

Out of all the small businesses on the internet, how many of them are still running an email server under someone's desk?

And not to mention, what about all the client-side javascript out there that parses email addresses on web forms? Think about all those throw-away regexes to parse email forms on websites. I've seen a lot of them break just on these new TLDs - and they are ASCII!

This comment is pretty frightening to me. Email is very often used for "mission critical" purposes, not just for exchanging messages with other users (let alone just "with other users of the top 5 providers"). I know that enabling international characters is important for a whole range of reasons, and that's worth some significant costs. But email infrastructure is emphatically not something where "works most of the time" or even "works 95% of the time" is good enough. I think there really is a need for pretty substantial attention to backward compatibility in this case.
While we're at it, can we simplify the grammar of an email address? Does anyone really embed comments in their address?
Out of curiosity, how would I send an email to somebody who's address is in a language I can't type?
In addition to copying the address, you can also use a virtual keyboard to map your existing keyboard to the keyboard of another language. I know that OS X has this built in, and I am assuming Windows does too.
I have a business card and the email address is in a language I don't know how to type (having a virtual keyboard doesn't mean I know how to reproduce the actual characters, which is true for many languages).

Or worse, somebody is trying to give me their address over the phone.

Here's an example: Try to figure out how to type:

宮本茂@任天堂株式会社.com

Here's another I'm pretty certain I can't figure out with a virtual keyboard.

প্রিয়াংশু.চ্যাটার্জী@बॉलीवुड.com

How about mixed languages? Like the case of a foreign worker assigned to another foreign department in yet a third country.

김기덕@बॉलीवुड.ประเทศไทย.com

I'm not going to tell you what languages these are in. Just pretend you were handed these on business cards or saw them on a slide deck at a conference and can't get a digital copy. Let me know how it goes.

Copy & Paste?
I wonder how gmail will handle the type-ahead for addy's in diff languages. Especially since I only type in US English.
Good luck typing in those characters, or even knowing how to pronounce them if you had a "sounds-like" index.
So, about those email validation regexes...
A refresher, for people who might not have come across this before: parsing even a limited subset of all possible email addresses with a regex is hard

http://www.ex-parrot.com/pdw/Mail-RFC822-Address.html.

Does it mean Gmail now support creating non-latin email addresses for Google account? I tried creating one now (in US) and got: "Please use only letters (a-z), numbers, and periods."
Does it support emoji?
If it supports all characters defined by the Unicode standard, then it should. However, it wouldn't surprise me if Google blacklists the non-language symbols (like the emoji, shapes, etc.) because it is not within the scope of their goal. You can't read emoji like a word, so it wouldn't work well with the goal of making email addresses readable.