Hacker News new | ask | show | jobs
by ryandvm 2343 days ago
Python 3 and IPv6 are the poster-children of how _not_ to do a major upgrade. I'm not sure what the right way is, but if the short-term advantages of the upgrade do not outweigh the immediate pain, prepare for the matter to drag out for _decades_.
5 comments

> I'm not sure what the right way is

The right way is to make sure that stuff that used to work in the previous version still works in the current version. Breaking people's work, especially work that spans multiple years, projects, knowledge, etc and expecting them to be happy about it is naive. Being condescending when they turn out to not be happy and try to avoid the unnecessary busywork forced on them does not help either.

This isn't just about Python, many libraries and languages (and some OSes - see iOS, macOS and to a slightly less extent Android) are terrible about this. The proliferation of semver with its normalization of breaking stuff (the fact that a dependency - be it a library or language or whatever - uses semver communicates that they have already decided that they will break backwards compatibility at some point) shows that most people are fine with breaking others' code.

The most painful breaking change was the string treatment. Breakage was necessary if you wanted to make it possible to have more confidence in the basic building blocks of python.

If you make a mistake when making a tool, you can either leave it forever, permanently causing pain for users forever, or you can try to find a path to fix it.

That being said, a Python 3.0 which was _just_ “can’t call encode in string, decode on bytes” then subsequent releases fixing up other stuff over time would have been much nicer.

Like the “everything is iterators now” release could have happened later.

The constant stream of breaking changes in Python - that is, in the "standard library" which may as well be part of the language - is the most frustrating thing about Python. There are perfectly good projects that can no longer be run without major work, just because they were left unmaintained for a few years. This is a silly state of affairs, and depressingly common when the fix is really simple: version declarations. Feel free to move fast and break things, but always provide the old behaviour if the user puts a "version=3.2" flag in their code. There's no reason this mechanism couldn't have extended to every change in Python since its release.

If POV-Ray can do it, Python could have done it.

Free Pascal changed the string type some time ago to make it encoding aware (i'm not really a fan of the idea, but it was done for Delphi compatibility which is considered important by the FPC developers). I only had to change a handful of lines in my 10+ year old code (at the time, now it is older) to make things work (all were about treating the string as a byte array and manipulating the memory directly - i just changed the type to RawByteString which provides exactly that functionality).

AFAIK that was the biggest "breaking" change they introduced by far. In general have code from 2007 that compiles out of the box and this sort of stability is why i stick with FPC (and C) despite it being messy sometimes.

If you make a mistake when making a language or API you should make sure whatever fix you come up with will keep the existing code working, most common way being that the old API is implemented in terms of the new (even if slower, things will keep working) or in the case of languages, new stuff that can conflict with existing code can be opt-in (Free Pascal often uses compiler submodes for this).

Yes, this makes implementing the library/language harder but it is going to be a bit of extra work for the implementors in exchange for avoid A LOT of work for the users (especially when you consider all the combined time wasted in porting Python 2 to Python 3).

The language we use at work (Delphi) changed its string type from ANSI to Unicode, and it took us less than a day to fix our ~500kloc code base, which does a _lot_ of string manipulations all over.

This was due to the hard work the people behind Delphi had put down to make the transition as smooth as possible.

Python 2s issues went beyond the string encoding.

Some functions would return either bytes our text (in python 2 parlance, strings or Unicode) depending on the input. People would call decode on text (despite it only making sense on bytes).

Ultimately Python 2 encouraged a programming model where if you just tested with ASCII everything worked but the instant one of your library users put in an accented character or a kanji everything would blow up.

Just to make things clear: many python 2 programs operated on bytes thinking they were text. There isn’t really a way of resolving this API without user intervention on declaration of intent (not saying 3.0 was perfectly right but not every change can be made backwards compatible if you still have existing code)

> The right way is to make sure that stuff that used to work in the previous version still works in the current version.

But that too brings considerable downsides.

For all its merits, C++ is an extremely bloated language, getting even more complex every release, due in no small part to its commitment to backward compatibility.

There's no perfect answer. Python3's decision wasn't stupid, they just chose one downside over another.

C++ bloated largely because they decided to make it bloated - they didn't had to, they just decided to shove in whatever new idea sounded good without much concern about the language's size.

But despite that i 100% guarantee you that people who actually use the language and have large codebases are really glad that C++ is backwards compatible and they do not have to waste time refactoring code that works.

> they just decided to shove in whatever new idea sounded good without much concern about the language's size.

Compared to most languages, C++ is very slow moving, but also very old.

> despite that i 100% guarantee you that people who actually use the language and have large codebases are really glad that C++ is backwards compatible

Of course.

IPv6 is a perfect example of the second-system effect [1] as in it added a bunch of things no one needed (but might need someday) and didn't solve roaming. All IPv4 really needed was a bigger address space.

But as soon as you cross the mental threshold of making a breaking change (which expanding the IPv4 address space obviously was) then it's easier to convince yourself to make a bunch more breaking changes. And this is where Python3 really lost its way (IMHO).

One of the silliest design decisions in Python3 was (initially) removing the string prefixes like s and u. Now obviously Python2 defaulted to ASCII and Python3 defaulted to Unicode but this decision just made making libraries compatible with both, so much so that they added it back (around 3.2-3.3 IIRC).

There are also always decisions you make that in hindsight you wish you'd done differently (eg the mutable Date class in Java) but just because you're making breaking changes doesn't mean you should "fix" all of those. You still have to look at each one and ask yourself "does this really matter enough to justify changing it now?". The default answer is "no" and the bar for "yes" should be really high.

I feel like Python3 failed here too.

And look where we are. Python3 out in 2008 and we're still writing migration guides in 2019.

[1]: https://en.wikipedia.org/wiki/Second-system_effect

It's important to realize however that not everyone feels that way about Python3. I'm glad they fixed it and I wish they'd fixed more.
Upgrades are just hard.

See perl5 to perl6 GWbasic to Qbasic to VB to VB.net

you either make a clean break or keep all the warts, Either way folks are going to be unhappy.

Keep the warts, COBOL, Fortran, C, C++, PHP, Excel

Ruby did a great job back in the day with their 1.8 release which changed the language to be Unicode friendly.
What did they do differently? I guess they benefited from hindsight.
> GWbasic to Qbasic

wait wuh? i thought these were just two of the dozen variants of the BASIC dialect... interesting!

Rename Perl 6 to Raku. People happier.
It makes sense if a language is so different and backwards incompatible than the previous version to just rename it to someone else.

Hence why a lot of people back in the day felt that Visual Basic .NET should be called Visual Fred instead :-P

At least Python3 brings an improvement. I still can't think of a good reason I should spend any time trying to figure out IPv6. Maybe my external router has to think about it, but that is about it.

Edit: If you down vote me please say why I should care about IPv6

The main mental stumbling block I have about IPv6 (and I know it's kind of a silly one, but it's honestly what I feel on introspection) is that I can't remember an IPv6 address off by heart. IPv4 addresses just _feel_ so much more human-consumable than IPv6 addresses. I can't imagine myself using IPv6 addresses on the command line like I do with IPv4 now.

There's also the issue that honestly, I have no idea what is using IPv6 and what's using IPv4 right now. On my internal network I only ever deal with IPv4, but I have no intuition as to what is using IPv6; I couldn't tell you off the top of my head if my ISP supports it.

> is that I can't remember an IPv6 address off by heart. IPv4 addresses just _feel_ so much more human-consumable than IPv6 addresses.

Stop remembering numbers meant for a machine, and use DNS. It will make your life so much easier. I spend ~$15/yr for my personal DNS name & hosting, and I never want to memorize an IPv4 or v6 address ever again.

> I have no idea what is using IPv6 and what's using IPv4 right now.

Any "end" machine (laptop, tablet, phone, server) is going to be dual stack, supporting both IPv4 and IPv6. If it is able to auto-configure an address, then it will use those. Otherwise, it won't. If a domain is IPv4-only, it'll use the IPv4 address. All of this is automatic.

The big issue, for home consumers, is that a lot of ISPs are dragging their feet. They don't need anything from the customers — they just need to get it deployed & turned on. Generally, typing "what is my IP address?" into Google will tell you if you have working IPv6; it will display an IPv6 address if you do.

In the cloud… some cloud vendors have been dragging their feet about rolling support out. You need to do some things, like associate an AAAA record to your domain (s.t. it resolves to an IPv6 address), and make sure things like logging can handle the new addresses, or if you implement IP blocking, that you can block those addresses/networks. If you're writing network code, you need to check that you're not assumptions about the socket type. You can also do things like HTTP proxy from an IPv6 connection to an IPv4 VLAN, e.g., I think w/ an ELB. That is,

  client <-- HTTP/IPv6 --> ELB <-- HTTP/IPv4 --> backend server
which allows a partial upgrade. None of it is terribly hard, but typical project management puts upgrading to future tech in the perpetual backlog.
> Stop remembering numbers meant for a machine, and use DNS

This is not a good solution. A lot of times people use IPs because DNS is not available or is more complex to set up. Say you are:

- Setting up and configuring a network. - Setting up the firewall. - Inspecting traffic and seeing where it goes. - Verifying that the DNS resolutions are being done correctly.

Most people already use DNS, because it's more comfortable. But anything that requires working with the network is now going to be much more complicated. For example, I can remember some network prefixes and know whether they are in building A or building B. IPv6 makes that much, much more difficult.

Also, the numbers are not even that well meant for a machine. Text representation of IPv4 is easy to detect. IPv6 representation? Good luck with that.

All of that because of the decision of using 128 bits instead of 64. 2^64 address would be more than enough, representable without issues in usual data types (uint64_t is a standard C type, uint128_t is not) and the problem of ipv6 representation would be far less relevant.

> A lot of times people use IPs because DNS is not available or is more complex to set up.

For the rare cases where a memorized IPv4 address was reasonable you should just assign simple IPv6 addresses ending in a short suffix like ::1. The prefix can be 64 bits or less, which isn't too much to remember, and will be the same for the entire network.

> For example, I can remember some network prefixes and know whether they are in building A or building B. IPv6 makes that much, much more difficult.

Nothing prevents you from assigning visually distinctive prefixes to different buildings. The larger address space actually makes this much easier. For example, you could use xxxx:yyyy:zzzz:a::/64 for every host in building A.

> Text representation of IPv4 is easy to detect. IPv6 representation? Good luck with that.

This regex will detect any IPv6 address:

    ([0-9A-Fa-f]*:){2}[0-9A-Fa-f:.]*
Then you can feed matches to a real address parser to filter out false positives.
Even if ISPs aren't dragging their feed, the home routers are lacking a giant amount of features on IPv6 stack, which are still needed even for basic users.

E.g.: - My Mikrotik will not to any kind of routing acceleration for IPv6 so throughput on GBit FTTH (which is standard offering here) will be significantly slower.

- There's no way to autoconfigure firewall (so no UPnP-like technology to enable voice calls and gaming).

- There's no way to statically assign addresses or autoconfigure firewall for automatically configured ones.

- There's no builtin way to push your own DNS to configure things like pihole.

So even a "simple" techy guy setup where you have a home NAS and a few machines that need to drill holes through the firewall is almost completely impossible on pretty much any router affordable for home use.

You're correct that the ISPs are dragging their feet, despite it allegedly being for their benefit - but exactly the same breakage risks apply. The cost/benefit tradeoff doesn't look good to them.

I just bought a new router. https://www.asus.com/uk/Networking/DSLAC68U/ It doesn't support IP6.

> I just bought a new router. https://www.asus.com/uk/Networking/DSLAC68U/ It doesn't support IP6. (sic)

That's not what your link says:

> while power users will love its IPv6 support

It sounds like you don't need to, and honestly, after all of these years I still don't know much about it either, because I haven't needed to.

One could argue that this is actually a significant benefit of IPv6, in practice.

cf Itanium vs amd64