Hacker News new | ask | show | jobs
by Entangled 3499 days ago
Web browsers should have an option to show non-ascii chars in urls in red.
9 comments

This would be a great solution. Allowing unicode characters in domain names is just inviting trouble. I understand that people with non-Latin scripts want domain names in their own language and alphabet, but there are way too many unicode characters that will confuse people about legitimate-looking domain names.

Showing non-ascii in red would be an easy solution for everybody.

It seems like a reasonable compromise would be to allow domain names in non-Latin languages as long as the entire name is in the character set of a specific language. So, if your name is in English, that's fine. If it's in, say, Cyrillic, that's fine too. But if you mix English and Cyrillic characters, that's not allowed. It wouldn't necessarily eliminate all name look-alikes, but it would get rid of most of them.
That's supposed to be one of the rules at the registrar level, but it's one that gets ignored in practice.

I have heard proposals that mixed-script IDNs get converted to punycode in URL display, but I don't know if any browser has fully implemented that yet.

Wouldn't a good compromise to be to somehow highlight any characters that are outside of ASCII and the character set of the language you are using in the browser/os?
Everybody who isn't colorblind anyway.
Use blue, the most common types of color blindness are red-green issues(there's a really tiny percentage that doesn't perceive colors at all but really really tiny. And other than those, nobody has trouble with blue)

Source: I'm colorblind(protanope) and red would definitely be an issue. Android studio, for example, is really annoying for me because the particular red they use for errors is very hard to distinguish from black

That kind of violates the intuitive "red/orange/yellow is alarm, blue/green/black is expected" notion though. What you could do is put ASCII characters in blue and non-ASCII characters in red.
Isn't this a general argument against ever using red as a warning? Seems to prove too much.

Especially in this case, where there is unlikely to be a specialized class of scammers who go phishing only for people with red-green colorblindness. So long as browsers implement a feature that stops the phishing in 99% of cases, the scammers will try something else.

It's an argument against using red as the only warning sign.

Compare to Chrome's https indicator- it turns the "https://" part of the URL green (which I can barely distinguish as different, so it is useless to me) and adds a padlock icon.

Colorblind-friendly graphs might use both color and symbols to distinguish elements.

-Tritanopes may beg to differ (regarding the blue not being an issue, that is.)

Significantly less common than red/green deficiency, though - I only know of one more on the island I live on (pop. 15,000 or so)

Wasn't aware of that one. Significantly fewer people affected though(I think red-green is something like 10% of males)
-Yup, the affected population is so small I don't bother suggesting to websites and software publishers that they may want to adapt their colour schemes anymore

Kudos to my employer, though - after some discussion, I was given a small budget and our SCADA GUI frontends now sport colour palettes optimized for deuteranopes, protanopes and tritanopes.

We've got a couple of very grateful feedbacks - and, unsurprisingly, quite a bunch of 'Gee, did you have some colorblind sod do your GUIs? My display looks like a Grateful Dead cover!' from people who've inadvertently messed with accessibility settings...

> people with non-Latin scripts want domain names in their own language

I've yet to see a useful site with Cyrillic domain. Theoretically it sounds good, but practically everyone still uses Latin domains. May be it'll change with time.

Same thing here in Japan with Kanji domains. Although I believe Windows XP usage here is still non-insignificant...
Don't even show the suspect URL, show "THIS MIGHT BE A SCAM", with some kind of hover over showing the URL, and some way to click to more information.
Why?

Non-latin alphabet domain names do have legitimate uses, although they are very rarely used.

Except by a third of all people who live in China and India. Not everyone speaks a language that is representable in the latin alphabet. In fact, a very large percentage of people do not.
And it is then worth noting that as it stands, the attitudes of western developers with respect to text input and name lookup has so horribly screwed the Chinese with respect to domain names that they started using numbers instead of letters for their major web properties.

https://newrepublic.com/article/117608/chinese-number-websit...

I live in India. I have never seen a non-Latin alphabet domain, except when I opened <some hindi word>.<tld> and <poop emoji>.com just out of curiosity. Could you show me some non Latin alphabet domain names that are used?

I am not claiming that everyone speaks a language that is representable in the Latin alphabet.

China and India don't pose a problem since Pinyin uses standard ASCII characters and neither Chinese characters nor Brahmic scripts have any symbols that resemble ASCII characters.
For the same reason that my email client occasionally tells me "this may be a scam," even though sometimes it's not and I act accordingly. Based on whatever criteria it's using, the data received has a somewhat higher chance of being illegitimate.

We as (technical) humans can recognize (hence this discussion) that the use of this uncommon G is meant to mislead you into thinking you're going to Google, when in fact you're going to Hell. I'd like to be warned of that possibility.

In this case, the extremely oversimplified algorithm might be "does the domain, as filtered down to canonical characters, represent one of the top five destination domains, yet go somewhere else if not canonicalized?"

The Chinese will be thrilled!
Chinese people will be fine since all Chinese URLs are either ASCII compliant or use Chinese characters, which can't be confused with any ASCII characters.

Russians would definitely be pissed though.

To my understanding the unicode standard encodes an ASCII transliteration of an Unicode symbol to itself, but what about typographical similances? Wouldn't that be a hard problem? Perhaps there are two unicode characters that look exactly the same (using a given typeface) but have different transliterations. Or vice versa - two totally different looking characters share transliterations and gave false alarms.
Just handle .рф domains (and the Serbian Cyrillic ccTLD) specially.
It's not a great solution since it requires knowing the difference between ASCII and Unicode... I would argue that a user who is vulnerable to falling for unicode characters in domain names won't have that knowledge.
"Neat, Sparkasse now even has a colored domain name! Now where did I put my TAN device again?"

- Average User

Cool, my head of marketing department wants a RED domain.
Chinese character domains would be shown in red letters. I think it's a good choice of color. :)
What about websites without Chinese characters? I know in Asia, having red colored names is kind of offensive (evokes of the Reaper's 'hit list').

Would be annoying if [name].me or whatever is red!

Black is associated with death in western culture and no one seems to be bothered.
I agree with you completely. Cultural sensitivity is a difficult thing when you have a global audience for your product or service.

Maybe as part of the locale configuration, in addition to number and date format, people should pick a friendly and an offensive color! :)

No reason for me to try to pick out specific characters. I won't notice. Plus, it won't work for zero width characters, and I might miss it for really tiny ones.

Give me a popup warning explaining the problem when I try to visit the site, same as I get for certificate problems.

They should already only show punycode for characters inside your locale.

'ɢ' is obviously an exception since (I imagine) it's considered to be in your locale, but maybe it shouldn't be.

That would be very confusing for multilingual users. Just because my OS is configured to use a certain locale, doesn't mean I don't read text in scripts not considered part of it.
Your OS (and browser) support multiple languages, so if you speak a language they should in the list.
They are of course, but if you use that list instead of a single locale, you end up with a solution that only highlights 'strange' characters when they are not part of your language/locale set. So for someone who speaks only Latin character based languages you could highlight all Cyrillic characters, but for someone who speaks Russian you still have the original problem (it's not as if you can just highlight all Latin characters in their case!).
It should display each code page with a different color. That would make the schema useful for non-english speaking people too.
\o/ rainbow URLs ftw!
It would need a more complex solution than that. For example, this is the website of a local bus company where I live:

http://åbus.fi

The characters are from the Latin character set, but non-ASCII. Highlighting the Å in red would look pretty confusing. And in many countries you want the entire domain name written in non-ASCII characters, depending on the language. E.g. websites in Russia, China, India, etc...

And on by default