Hacker News new | ask | show | jobs
by cabalamat 3487 days ago
> When people use non-ASCII letters for folder names I think they are just asking for trouble

Well they are

> how normal is that?

Not as normal as it ought to be.

Folder names (and file names) are identifiers. Identifiers should identify. If you allow unicode characters, then you can have two character sequences that look the same but are actually distinct. This is confusing at best and at worst (in URLs) could facilitate fraud.

3 comments

> If you allow unicode characters, then you can have two character sequences that look the same but are actually distinct.

That would be a software bug. If you want to compare unicode strings, you need to normalize them first following the rules laid out in the standard. https://en.wikipedia.org/wiki/Unicode_equivalence

git for example fails that test. (try creating a repo with a file named 'ΓΌ' and check it out both on a mac and a linux system)

There are more issues about glyphs that are a distinct character but look the same in a given font, but what's your proposal? All people transliterate everything to ascii? Display punycode in URLs?

> Identifiers should identify

Except that, without Unicode, you now have to use a slighly restricted subset of many European languages and cannot use Cyrillic, Chinese etc at all. So people who are not operating in an English environment are limited in their ability to clearly identify things.

"then you can have two character sequences that look the same but are actually distinct."

If you accept that argument, you should also remove either 1 or l, O or 0, S or 5 etc.