| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by gardarh 3529 days ago

Actually, I think the strictness of python3 with regards to bytes/strings is particularly useful for people who speak English as their first language. python3 will force them into writing software that can work with international strings whereas python2 code written by an English speaking person would in many cases just be broken for the international users.

And believe me, as an "international" person, this is profoundly annoying and has trained me in avoiding using the non-ASCII letters of my language which of course "castrates" the written language somewhat (i.e. When people use non-ASCII letters for folder names I think they are just asking for trouble - how normal is that?)

I love how upfront python3 is with the difference between raw bytes and strings. I work a lot with python and in particular I do a lot of work with serializing and deserializing stuff from/to binary blobs. Due to platform issues that I hope will change in the next year or so I can't use python3 currently but I keep all my code compatible with both and all tests should run with python2 and python3. Yes, I am firmly in the python3 camp :)

pycharm inspection definitely helps with keeping code compatible with 2 and 3.

1 comments

cabalamat 3529 days ago

> When people use non-ASCII letters for folder names I think they are just asking for trouble

Well they are

> how normal is that?

Not as normal as it ought to be.

Folder names (and file names) are identifiers. Identifiers should identify. If you allow unicode characters, then you can have two character sequences that look the same but are actually distinct. This is confusing at best and at worst (in URLs) could facilitate fraud.

link

Xylakant 3529 days ago

> If you allow unicode characters, then you can have two character sequences that look the same but are actually distinct.

That would be a software bug. If you want to compare unicode strings, you need to normalize them first following the rules laid out in the standard. https://en.wikipedia.org/wiki/Unicode_equivalence

git for example fails that test. (try creating a repo with a file named 'ü' and check it out both on a mac and a linux system)

There are more issues about glyphs that are a distinct character but look the same in a given font, but what's your proposal? All people transliterate everything to ascii? Display punycode in URLs?

link

pjc50 3529 days ago

> Identifiers should identify

Except that, without Unicode, you now have to use a slighly restricted subset of many European languages and cannot use Cyrillic, Chinese etc at all. So people who are not operating in an English environment are limited in their ability to clearly identify things.

link

Someone 3529 days ago

"then you can have two character sequences that look the same but are actually distinct."

If you accept that argument, you should also remove either 1 or l, O or 0, S or 5 etc.

link