Hacker News new | ask | show | jobs
by kr0bat 960 days ago
As someone who wasn't around for the Python 2->3 transition, what made it so painful?
11 comments

To provide a counter-example: the startup I was working for switched from python-2.7 to python-3.4 after that came out. It was fine, no major issues. I wrote a whole lot of python2.7 before that, and a whole lot of python3.5 afterwards. I maintained our deployment and local dev scripts, and a couple libraries and tools on pypi, which were compatible with both python3 and python2. It was very doable.

I think one mistake was promoting the "2to3" converter early on. It turned out better to add a couple features to python2.7 and python-3.2/3.3/3.4 which made it quite feasible to write code compatible with both.

It is perhaps ironic that writing code that deals with str/unicode/bytes across both python2 and python3 is a bit more complicated than just python2 - but, again, still doable. I did it for years, many popular libraries did it for years, until recently dropping python2 support. It worked.

My opinion is there were 2 major issues and a bunch of minor cuts.

The major issues (imo):

* Reliance on extending the runtime via the C-bindings (and that changing)

* A community that had largely gotten accustomed to stability being thrust into an enormous change all at once

I think people tend to be in a camp of either "this was good and needed to happen because python had unshakable warts" or "this was bad and we should have lived with language mistakes forever". I think both of those camps conflicting is really what made it really painful - breaking changes were held for years and then once one got in they all came. I think the reality is that coming up with a migration plan incrementally working towards it would give developers more time to focus on one upgrade rather than a full rewrite. Node gets a bad reputation for being "chaotic" or "constantly changing" but the changes are small and manageable comparatively. Go on the other hand has managed to maintain strong stability, but with an ecosystem that's working primarily in the core language rather than the implementation language (C for python).

I think the python 3 merge did a ton of damage to the community's willingness to encourage breaking changes that are needed and it's why packaging and runtime self hosting have been comparatively weak despite a huge userbase.

Not breaking apis is a great ideal but if you have to, breaking them in planned, bite-sized, frequent bursts is often MUCH MUCH better than once a decade.

I finished what will hopefully be my last 2->3 transition ever back in 2021. 6 months, ~180 commits, and 3,000-odd files in that single PR, after years of preparatory work by others. I needed a sabbatical after that.
The transition was non-incremental.

There was no way to use python 2 libraries in python 3. This, right out of the gate, made them feel more like different languages than a migration, and forced people to delay their own migration until all their dependencies had migrated.

(Example of how to do it properly: "netstandard" in C# libraries could be used in .Net Framework and .Net Core, despite those being very different runtimes. Heck, even the ancient Microsoft "_UNICODE" migration where all your functions got macroed to either functionA/functionW depending on which string type you were using was less miserable.)

Within that, it was difficult to migrate code file-at-a-time. If your project had really good isolation between modules and they all had separate matching unit tests, then you had a good chance. Otherwise it was "run program, whack bug, repeat x1000".

After a while libraries (six, future) were developed to allow you to write polyglot code that generally worked in both. This mitigated the print() issue. print() was less of a problem for large projects, more a problem for novices with outdated tutorials, and people with large script collections all of which broke individually.

Changing the string type, and return type of lots of IO functions, in a non-typechecked language, caused total mayhem. Suddenly all sorts of code which never cared about character encoding was forced to.

The sad thing was that the transition could have been incremental - the "polyglot option" arrived somewhere in the middle of the transition, and a lot of debate was still raging as if there was no way to use python2 libraries in python3 (and vice versa) when in our migration we've already took the initiative and adapted the few (relatively small) libraries we needed so that they supported both python2 and python3.
> As someone who wasn't around for the Python 2->3 transition, what made it so painful?

All the people who screamed and yelled so much that they prevented Python 2 from having significant breaking changes. That meant that the needed changes kept piling up until the list was gigantic.

People forget that the Python 3 thing wasn't done in a vacuum. All the important people in Python had direct memory from the upgrade of Python 1 to Python 2 and what a big fiasco that was.

So many people dragged their feet on that that Guido et al. made a point of making 3.0 have hard, breaking changes in an attempt to force the upgrade through in a timely fashion instead of the long, drawn out, painful process that was the Python 1 to Python 2 change.

We all know, in hindsight, that the forces of inertia were FAR more intransigent than Guido and Co. estimated. However, that wasn't obvious looking forward.

I'm not sure there was any good solution. People would have pissed and moaned no matter what.

IMHO there was a good solution - launching python 3.0 only when you had a working solution to make a library that is usable in both python2 and python3 code, as was possible later on, and what IMHO was a key factor in making the migration actually work.
I don't think "intransigent" is a fair description. At the end of the day, Python 3 was a different language from Python 2, and people weren't going to switch languages if there wasn't a benefit to them. That's a perfectly reasonable position. The Python dev team extended the end of life date for Python 2 from 2015 to 2020 because they realized that Python 3 simply hadn't advanced enough by 2015 to make it worth while for all the Python 2 users to switch.
A someone who just started learning Python at the time, it was terribly discouraging that I couldn't even get a "hello world" script to work, following exactly the example in any tutorial. I kept thinking there must be something wrong with my system configuration, or my environment variables, my character encoding, or something. There's always a million things that can go wrong when getting started with an unfamiliar programming language.

I didn't imagine they'd made a breaking change affecting the print statement which, of course, no beginner-learn-python site was yet aware to warn you about. The error message at the time was not so helpful as it is now.

One of the big changes was to make many basic tools into lazy iterators instead of greedy lists (map, filter, range, etc.). I can remember having to teach my fellow scientist/engineer colleagues not to just loop over an index variable when working with certain datasets, but this was a major bit of friction because they were so used to Matlab and other languages where direct indexing is the primary method. While lazy execution is great for many things, it is not something that is necessarily common knowledge among people who use coding as a means to an end.

Additionally, because so many of the core libraries were 32-bit only or Python 2 only, you ended up having to either write your own version of them or just go back to 32-bit Python 2. Numpy in particular (and therefore transitively anything halfway useful for science and engineering) took several years to stabilize and I have many memories of having to dig into things like https://www.lfd.uci.edu/~gohlke/pythonlibs/ to get unofficial but viable builds going for the Windows machines we used. It was enough of a pain to deal with dependencies that I actually ended up rolling my own ndarray class that was horrendous but just good enough to get the job done.

davidjfelix outlines some of the issues, but I wanted to add in some that may have been simple upfront headaches that made people resistant:

- Simply put, print was ` print "text" ` instead of ` print("text") `. Looking back, this was such an annoying habit to break; but if you're codebase had thousands of print statements, that becomes thousands of changes

- range(3) used to return a list [0, 1, 2] while Python 3's range(3) returns an object. If your codebase relied on that explicitly created list, then it'd break in Python 3.

- Division was originally integer division, so again if you expected an integer and are now getting a float (with a non-numeric decimal point), then more crashes

- except used to be except (Error1, Error2) as e and Python 3 explicitly requires each except to be on their own exception catch

All in all, tons of changes that you'd need to do before switching otherwise your codebase would crash. It also meant you couldn't rely on Linux's default Python (2.7). I never needed to make the switch on a production base, but hopefully you can see why someone would drag their feet

[edit] Python was also still a "new kid on the block" of languages. Its popularity was growing, but since it was not an industry standard yet, these systems were mostly through hobbyist, so I imagine there was plenty of just trying to find that mythical "free time" I keep hearing about.

Don't forget string literals:

Python 2 had b'byte string' and u'unicode string' with unmarked being 'byte string'

Python 3.0 had b'byte string' and unmarked being 'unicode string'

Because python 2 was so lenient with mix'n'matching the two string types, it wouldn't error if you only were using ASCII values and finding all these places where strings weren't quite right can get pretty difficult. It also meant libraries had to awkwardly always use b'byte string'.decode('utf8') if they wanted to create a unicode string and be compatible with both python 2 and 3.

Python 3.3 then reintroduced prefixed u'unicode strings' to make it significantly easier for libraries, simply by always using b'' and u'' instead of ever using ''. It also made any preexisting unicode-aware code "just work" without having to be converted from u'' to ''.

I think I remember hearing about other similar compatibility changes made in either 3.4 or 3.5, but can't remember what they would have been.

except (Error1, Error2) as e still works in python3.
Everyone had to change all their code and dependencies because of breaking changes that had 0 benefits for most people.
the most important benefit that most people ignored until they needed to debug something was fixing the horribly broken exception system in Python 2, were you were losing your stack traces when reraising.

most people who don't come from native-English countries also immediately benefited from unicode awareness by default. there were a few folks who cried about having to prefix bytes with b'' when pushing text through sockets, a vocal but small minority.

> there were a few folks who cried about having to prefix bytes with b'' when pushing text through sockets

It wasn't just that; any program with string literals in it that were being treated as bytes in Python 2 would have to have all those string literals prefixed with b in Python 3; otherwise the program would break.

Also, defaulting to unicode meant having to have a default encoding that became critical in many more places--for example, the standard streams (stdin, stdout, stderr) were now unicode by default instead of bytes, so there were now plenty of new footguns when the standard stream encoding that Python guessed was wrong and you had no way to change it. Not to mention that if the standard streams were pipes instead of ttys, unicode made no sense anyway.

* The "python" command pointed to "python2" on many Linux system for a long time

* A lot of libraries were not ported for many years

* People kept repeating that "Python 3 is not ready yet", because they read an old statement on Stackoverflow and didn't consider that this might change in only a few years.

It was pointless. The python core team decided to obsolete every existing line of python code in exchange for basically nothing. On top of it a few features were changed in ways that felt outright vindictive (division, the u'' convention being the ones that affected every single codebase).
Unicode was a meaningful change to the language that it's difficult to imagine could have been done without breaking compatibility or repeating C's mistakes. There isn't much excuse for the rest, as forks with miniscule development resources managed to backport everything else. 3 did turn out a generally nicer language than 2, even if it was slower and caused a solid decade of pain.
Except that python2's `unicode` was a mistake. We already knew at the time that "just use utf-8 for everything, and never use numerical indexing" was the way to go, but instead python3 decide to leave us with no working string class, as opposed to python2 which at least had a mostly-working one.

That said, the major forces against the transition were that it was impossible to write code that worked with both versions for a long time:

* python 2.7 added support for a few python3 syntax/library features, but python 2.6 was very widely deployed on stable distros.

* it wasn't until python 3.3 (very late to actually be deployed) that you could even write a unicode string literal (you know, the thing the claimed the transition was all about) that still worked in python3

* python before 3.3 had a completely bogus idea of "unicode" on Windows anyway, even ignoring the API nonsense.

* python 3 completely broke the way indexing worked for `bytes`, making it produce integers (rather than single-codeunit instances) of all things.

* there was a lot of gratuitous package breakage. Instead of leaving deprecated shim packages, they just removed them, and you hard to add a third-party dependency to get something that worked with both versions (and said dependency hooked into core parts of the interpreter in weird ways).

It wasn't until around 3.5 that there started being any actual advantage of python3 at all. But there is still tons of code that is no longer possible to write.

Python is made for cobbling together things with string and ducktape, for users who would do such a thing. Any change, however good and trivial will have them paralyzed, gnashing their teeth.
Not fair and accurate comment imho. Cobbling users didn't tell Linux distros not to ship Python 3.
It's how I see it ¯\_(ツ)_/¯ Distros are mostly cobbled together with stringly-typed bash scripts too.