Hacker News new | ask | show | jobs
by ptx 4107 days ago
Python 3 is useful for a beginner to start with because they don't have to learn the deprecated cruft that Python 2 preserves for compatibility. If they just learn the Python 3 way of doing things, it generally works in both 2 and 3.

In his explanation of classes he has several paragraphs about old-style vs. new-style classes, ending it with:

  "Just completely ignore the idea of old style versus
  new style classes and assume that Python always requires
  (object) when you make a class. Save your brain power for
  something important."
Python 3 only has new-style classes, so the entire explanation could have been left out, allowing the beginner to, as he recommends, focus on more interesting things.

Another example is his advice, in exercise 11, to avoid the input() function because of its security problems. Python 3 doesn't have that function, so the beginner doesn't have to remember to avoid it.

And of course, there's Unicode, which this book seems to completely ignore. A beginner starting with Python 3 has to learn the "Unicode sandwich" approach[1], which applies very well to Python 2 code as well. But someone starting with Python 2 can easily be confused about the concept (because the language is confused) and will have a hard time getting things to work correctly. For example, the book recommends that people "from another country" set their source encoding to UTF-8 – good luck printing things on Windows.

[1] https://www.youtube.com/watch?v=sgHbC6udIqc

2 comments

No one's saying that Python 3 doesn't improve on Python 2 in significant and tangible ways. The key phrase is "If you learn Python 3 then you'll still have to learn Python 2 to get anything done" which is absolutely true in my experience.[1] This is because as soon as you hit a single dependency that doesn't support Python 3, you have to switch to Python 2. And there are still a good number of important modules that are Python 2 only. (And no, reimplementing the functionality of dependency oneself is not an option for a beginning programmer.) And even if all the modules you need right now are available for Python 3, you might find later that the new feature you want to implement in an existing program requires a Python2-only module. No matter how great Python 3 is, and how much we all wish we could switch to it, we can't just will all our dependencies to add support for it.

[1] Obviously your experience may vary depending on which modules are considered essential for your work.

I'm from "another country" and I always asumed all the characters that couldn't be printed on screen by Python were cmd.exe's (and powershell) fault for not handling Unicode correctly, not a Python "error" per se.

Also, all my Python sources are set to UTF-8 and I never had any problem in Windows. Notepad.exe gives you the encoding option when you save a file, and every sensible text editor/IDE gives you encoding and line feed options.

So what would be the problem with Zed's tip? Have you ever tried to run a Python script with special characters? The interpreter dies instantly with an encoding error. It's easier to set the encoding to UTF-8 and get the program running than parse the whole thing checking whether you used a special character in the comments -- which shouldn't affect program execution, but hey!. Also, this way you can write meaningful comments in your native language without worrying if it'll kill the interpreter right away.

The problem is that the Windows commandline, legacy Windows programs and modern Unix systems all use different encodings, so any particular string of bytes (representing non-ASCII text) will only be correct on one of them.

For example, let's say our Other Country is a Western European country. The encoding for non-Unicode Win32 programs will be Windows-1251 (more or less ISO 8859-1) and the encoding for MS-DOS programs and the commandline will be codepage 850.

In this scenario, this Python 2 program (saved as UTF-8):

  #-*- coding: utf-8 -*-
  print "ångström"
will print the wrong thing – "├Ñngstr├Âm" if you run it from the commandline, and "Ã¥ngström" in a more Windowsy context (e.g. if you're writing it to a file and reading it in Notepad).

To make it correct, you can apply the Unicode sandwich approach:

1) Know the input encoding and decode from that to Unicode.

2) Process the text as Unicode.

3) Know the output encoding and encode into that encoding on output.

In other words, making it a Unicode string will transform the text from whatever encoding you chose to write the file in to whatever encoding your terminal happens to use, so this program will always (if the system is correctly configured) print the right thing:

  #-*- coding: utf-8 -*-
  print u"ångström"
In Python 3, UTF-8 source encoding and Unicode strings are the default, so the correct program becomes simply:

  print("ångström")
The problem isn't extended characters in your Python script, it's how your Python script handles extended character data. Scripts written in Python 2 that ignore the existence of Unicode won't always do the right thing when they encounter non-ASCII strings in the wild.