| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jordanb 190 days ago

It was annoying but if it hadn't happened Python would still be struggling with basic things like Unicode.

Organizations struggled with it but they struggle with basically every breaking change. I was on the tooling team that helped an organization handle the transition of about 5 million lines of data science code from python 2.7 to 3.2. We also had to handle other breaking changes like airflow upgrades, spark 2->3, intel->amd->graviton.

At that scale all those changes are a big deal. Heck even the pickle protocol change in Python 3.8 was a big deal for us. I wouldn't characterize the python 2->3 transition as a significantly bigger deal than some of the others. In many ways it was easier because so much hay was made about it there was a lot of knowledge and tooling.

2 comments

xscott 190 days ago

> It was annoying but if it hadn't happened Python would still be struggling with basic things like Unicode.

They should've just used Python 2's strings as UTF-8. No need to break every existing program, just deprecate and discourage the old Python Unicode type. The new Unicode type (Python 3's string) is a complicated mess, and anyone who thinks it is simple and clean isn't aware of what's going on under the hood.

Having your strings be a simple array of bytes, which might be UTF-8 or WTF-8, seems to be working out pretty well for Go.

MangoToupe 190 days ago

I can't say i've ever thought "wow I wish I had to use go's unicode approach". The bytes/str split is the cleanest approach of any runtime I've seen.

zahlman 188 days ago

What you propose would have, among other things, broken the well established expectation of random access for strings, including for slicing, while leaving behind unclear semantics about what encoding was used. (If you read in data in a different encoding and aren't forced to do something about it before passing it to a system that expects UTF-8, that's a recipe for disaster.) It would also leave unclear semantics for cases where the underlying bytes aren't valid UTF-8 data (do you just fail on every operation? Fail on the ones that happen to encounter the invalid bytes?), which in turn is also problematic for command-line arguments.

JoshTriplett 190 days ago

With the benefit of hindsight, though, Python 3 could have been done as a non-breaking upgrade.

Imagine if the same interpreter supported both Python 3 and Python 2. Python 3 code could import a Python 2 module, or vice versa. Codebases could migrate somewhat more incrementally. Python 2 code's idea of a "string" would be bytes, and python 3's idea of a "string" would be unicode, but both can speak the other's language, they just have different names for things, so you can migrate.

kstrauser 190 days ago

That split between bytes and unicode made better code. Bytes are what you get from the network. Is it a PNG? A paragraph of text? Who knows! But in Python 2, you treated them both as the same thing: a series of bytes.

Being more or less forced to decode that series into a string of text where appropriate made a huge number of bugs vanish. Oops, forget to run `value=incoming_data.decode()` before passing incoming data to a function that expects a string, not a series of bytes? Boom! Thing is, it was always broken, but now it's visibly broken. And there was no more having to remember if you'd already .decode()d a value or whether you still needed to, because the end result isn't the same datatype anymore. It was so annoying to have an internal function in a webserver, and the old sloppiness meant that sometimes you were calling it with decoded strings and sometimes the raw bytes coming in over the wire, so sometimes it processed non-ASCII characters incorrectly, and if you tried to fix it by making it decode passed-in values, it start started breaking previously-working callers. Ugh, what a mess!

I hated the schism for about the first month because it broke a lot of my old, crappy code. Well, it didn't actually. It just forced me to be aware of my old, crappy code, and do the hard, non-automatable work of actually fixing it. The end result was far better than what I'd started with.

JoshTriplett 190 days ago

That distinction is indeed critical, and I'm not suggesting removing that distinction. My point is that you could give all those types names, and manage the transition by having Python 3 change the defaults (e.g. that a string is unicode).

kstrauser 190 days ago

I’m a little confused. That’s basically with Python 3 did, right? In py2, “foo” is a string of bytes, and u”foo” is Unicode. In py3, both are Unicode, and bytes() is a string of bytes.

JoshTriplett 189 days ago

The difference is that the two don't interoperate. You can't import a Python 3 module from Python 2 or vice versa; you have to use completely separate interpreters to run them.

I'm suggesting a model in which one interpreter runs both Python 2 and Python 3, and the underlying types are the same, so you can pass them between the two. You'd have to know that "foo" created in Python 2 is the equivalent of b"foo" created in Python 3, but that's easy enough to deal with.

MangoToupe 189 days ago

Ok who would suggest this when the community could take a modicum of responsibility

zahlman 188 days ago

Why would I ever risk inflicting a stack trace like

  Traceback (most recent call last):
    File "x.py", line 2, in <module>
      foo.encode()
  UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0: ordinal not in range(128)

on a user of Python 3.x where it isn't possible? (Note the UnicodeDecodeError coming from an attempt to encode.)

MangoToupe 190 days ago

> With the benefit of hindsight, though, Python 3 could have been done as a non-breaking upgrade.

Not without enormous and unnecessary pain.

JoshTriplett 190 days ago

It would absolutely have been harder. But the pain of going that path might potentially have been less than the pain of the Python 2 to Python 3 transition. Or, possibly, it wouldn't have been; I'm not claiming the tradeoff is obvious even in hindsight here.

MangoToupe 189 days ago

I think you have causation reversed: it would have been at least two orders of magnitude greater to act like moving to python 3 was harder than staying. But you do you boo :emoji-kissey-face:

SoftTalker 189 days ago

Pain on whose part? There was certainly pain porting all the code that had to be ported to Python 3 so that the Python developers could have an easier time.

MangoToupe 189 days ago

Yes, exactly. customers need to stop acting like a bitch if they wanna be taken seriously