Hacker News new | ask | show | jobs
by BiteCode_dev 2274 days ago
The reasons the transitions worked were that:

- the languages were not that different, so no need to learn the new version

- python core devs gave 10 years to make the transition. Then extended it.

- it was possible to write code running on python 2 and 3

- python is very expressive, hence the code base have way less numbers of lines than in C++

- python cared only about one implementation, Cpython. The rest of the world needed to follow. Some didn't, like Jython and stackless, and the community didn't blink.

Despite all that, the transition was very painful.

3 comments

“Worked?” Nothing about the Python 2 to 3 transition “worked” and people are still aghast that’s Python 2 is being EOL’d.
I've been coding in Python for 15 years, and from what I see in the numerous places I get to work, yes, it worked.

It's just that aghast people are the most vocal. You don't hear the 90% of people that are happy. They don't take the time to speak up. But the unhappy complain all the time.

I’m not speaking about developers complaining, I’m speaking about the fact that it’s still a fractured ecosystem and plenty of popular programs, packages, and scripts still in use or even still actively developed rely on or integrate with Python 2 (exclusively).
> I’m not speaking about developers complaining, I’m speaking about the fact that it’s still a fractured ecosystem and plenty of popular programs, packages, and scripts still in use or even still actively developed rely on or integrate with Python 2 (exclusively).

Such as? What actively developed tools or ecosystems rely or integrate with python2 exclusively?

There are people still using Windows XP.

There are programs still using COBOL.

That's life in IT.

Yeah I'm one of the happy ones. I always loathed Python 2 and text handling, Python 3 was a huge improvement from my perspective.
I'm several years into the "upgrade" and find myself still swearing daily at the idiocy of the whole thing. Hundreds of scripts used maybe once a year, such as 'dups.py' I tried to run today, broken by a missing parenthesis, and a function moved around.

Utterly pointless and reputationally ruinous. I don't do serious work in Python any more

That has...nothing to do with python3 though, that's just your broken code being broken.
I recognize it may be difficult to understand, but this is a thread about backwards compatibility. Of course if I had any faith in the contemporary Python community, I would still be treating Python as a serious programming language and not be suggesting this is a difficult concept for someone to grasp.
I'm not sure what you're getting at. Perhaps if you could translate to less-smug, the rest of us could understand.
There's really not much to it, but if it's a foreign concept it may require a kind of conceptual leap. Essentially, this code worked perfectly well, and suddenly it no longer worked. From your perspective, as you quite effortlessly put it, this code is broken. For others elsewhere in more conservative parts of the world, it was not my code that broke, but the surrounding ecosystem.

There is a fabulously unending depth to explore in the chasm lying between these opposing world views. It would be more than possible to write a book on the topic and fail to cover it all, however here are some of the most important aspects, from my perspective at least:

* given a perfectly functional tool relied on heavily by its user to perform their job, and given that tool suddenly decides to change shape such that it no longer fits the user's hand without retraining, nor fits with the remainder of the user's toolset, including custom tools the user has invested in producing, the continued utility of the no-longer-functioning tool is called into question, along with a deserved reappraisal of the tool's applicability in the context of the user's original intended problem domain.

* when the reason for its reshaping is to solve what are highly important problems from the perspective of the tool, but much less so from the perspective of the average user, and that user's application of the tool to real world problems, it can no longer be said that the tool is simply an implement that may be called and relied upon at any point in future -- the tool develops a chaotic life and importance all of its own, and may choose to reshape once again at any future moment (and indeed in this case it has). It is no longer a tool, but some sentient entity demanding unpredictable ongoing costs and attention paid all of its own.

* given a tool that promises to cease functioning 'correctly' at any future moment based on its own whim, preferences, industry fashions and styles, in an ecosystem where many similar such tools exist that explicitly promise not to cease functioning over the same time period, it is a fool's errand to pick the tool that promises to externalize additional costs on the user when alternatives exist that avoid any such cost.

* given tool designers who externalize almost frivolously minor technical costs on to every user, where each 'minor' change is amplified perhaps 10000 times over and directly translates into expensive engineering time, the question is easily raised whether the philosophy of the tool is appropriate for its advertised utility, and whether continued reliance on the tool makes business sense. In economic terms, what was the cost to productivity of the retraining and re-tooling of users compared to any alleged future productivity improvement?

* had I written these scripts in bash, C# or C++, they would not have broken even remotely to the same degree. Of course these are not some completely unevolving entities either, however all take the promise of forwards compatibility deadly seriously, and it is more than possible to find 10-20 year old programs written in C++ or bash that continue functioning to the present day. From my perspective, they are therefore excellent and highly dependable tools.

print vs print() is what I think the parent comment was referring to.
The Unicode issue is one of the most painful sticking points. Handle legacy filesystems with the chance of filesystems that have previously valid (non / non \0 containing filenames) files in various encoding soup nightmares? Python3 is NOT the the tool for that job! Which means you can't write any systems tool stuff in Python3 because there's a good chance it'll blow up unexpectedly, or that you suddenly have to handle all sorts of things that in any other language you can just GIGO and move on.
Python 3 promotes the most likely scenario: you are on a modern system with normal looking filenames.

It would be unwise to design an API that promotes a niche need like dealing with a legacy file system with corrupted file names. This is what Python 3 fixed, making the easy things easy, and the complicated things possible, not the other way around.

But Python 3 is absolutely up to the task of handling legacy file systems with random encoding mixed in, you just need to tell it explicitly you are doing so.

Let's create a file with a completely garbage name, made of random bytes, which is allowed on Unix:

    >>> import os, sys
    >>> sys.version_info          
    sys.version_info(major=3, minor=7, micro=5, releaselevel='final', serial=0)
    >>> with open(os.urandom(32), 'wb') as f: 
    ...     f.write(os.urandom(200)) 
If you pass bytes to any file system function, it will return file names as bytes:

    >>> filename = os.listdir(b'.')[0]
    >>> type(filename)
    <class 'bytes'>
    >>> filename[:10]   
    b'\xf8-U\xa5\x1dq\xad?\xbf\xa2'
And you can just open that:

    >>> data = open(filename, 'rb').read() 
    >>> data[:10]
    b'E\x05\xce*M \xf5\xfeK\x18'
    >>> type(data)
    <class 'bytes'>
You do exactly the same as what you did with Python 2, and treat the files as raw bytes entirely, without thinking about the content.

Some API in Python require text. If you want to pass the file names to those API, you can use surrogate escape, which let you convert back and forth between arbitrary bytes and utf8 text, without loosing information:

    >>> as_text = filename.decode('utf8', errors='surrogateescape')
    >>> as_text[:10]
    '\udcf8-U\udca5\x1dq\udcad?\udcbf\udca2'
    >>> type(as_text)
    <class 'str'>
    >>> as_text.encode('utf8',  errors='surrogateescape') == filename
    True
This is a good thing, it forces the dev to be explicit about the places in your code where you are dealing with a specific scenario. It also makes you pay the price of doing so opt in, not opt out.

If you have to do a robust version of this with Python 2, you will have to do that anyway: at some point mixed encoding will bite you if you don't have a neutral representation for them. Python 2 gave you the illusion of robustness, because it said "yes" to most operations.

I remember quite well that a lot of Python 2 programs didn't work in Europe because your user directory would contain your name, which could be non ascii. Python 2 programs are opt in to deal with it. It's the opposite philosophy, and caused so many crashes.

I failed to mention that it's only useful to do this manually if you need to decode path with mixed encodings, or share said path with the rest of the world.

If all you need is to work with arbitrary mixed bags of file names, you can just pass "str", and Python will use automatically and transparently surrogateescape everywhere.

I think that the main complaint is that the early versions of python 3 had deep issues with this and it took years to fix them.
The first versions of python 3 were indeed, not suitable for serious work. Python 3 started to be usable from 3.3, interesting from 3.4, comfortable from 3.5, and objectively way better than 2.7 from 3.6.

It's not a surprise, as most softwares need iterations to get get good. Python 2.7 has not started as the amazing tool it is now, and as I started my career with 2.4, you remember some funny stuff.

This is why Python 2.7 was kept around for 13 years after Python 3 first came out.

Now, 3.6 came out in 2016. It solves many issues 2.7 had, and add tons of goodies. It's very ergonomic, can be installed easily. It's a great software.

It's 2020, let's enjoy the goodness of Python 3.

But this is exactly why people keep piling on about python 3, as an "upgrade" it was worse than what tried to replace for 8 years.
for a long time it was impossible to write code compatible with both python 2 and 3
It started to be possible with Python 3.3, which came out 8 years ago.

We are at Python 3.8.

Python 3 has been around for 13 years, during which Python 2.7 was still supported.

> python is very expressive, hence the code base have way less numbers of lines than in C++

This doesn't match reality. In reality, many Python projects have way more lines of code than equivalent C++ projects. Probably because of the 'expressiveness' you cite; you can't really showcase your love of coding and job security though artificial complexity without 'expressive' bells and whistles. (This is the idea that lead to languages like Go, I'm pretty sure.)

That said, C++ is plenty 'expressive' itself.

> many Python projects have way more lines of code than equivalent C++ projects.

That is mathematically impossible. Even if the syntaxes were exactly the same (which they are not, python syntax is on average shorter), the low level nature of C++ requires your code to do operations that Python does need to do, such as memory management.

It's like stating the sky is red.

Once you've written sufficient unit tests to prove your code works as well as a C++ equivalent does just by compiling, I don't think Python ends up being more dense though.

I'd also argue that operator overloading etc lets C++ be just as expressive as Python, the libraries just need to be designed with that in mind.

> Once you've written sufficient unit tests to prove your code works as well as a C++ equivalent does just by compiling

You mean in the same way C++ dev write tests to prove that all their code has no memory error which you get in Python for free ?

Except:

- tests are way shorter to write in python than in C++

- C++ devs often write zero tests for their code, just like python devs

- python duck typing + REPL means compiler checks are rarely necessary

- if you need to be type checks, you use type hints in python, which even then is still less verbose than c++

> I'd also argue that operator overloading etc lets C++ be just as expressive as Python, the libraries just need to be designed with that in mind.

Ok, let's say you have this json:

    [{
        "name": "Kévin",
        "age": 23,
        "hired": "2005-06-03 02:12:33",
        "emails": ["kevin@foo.com", "kevin@bar.com"]
    }, {
    }, {
        "name": "Joël",
        "age": 32,
        "hired": "2003-01-02 12:32:11",
        "emails": ["joel@foo.com", "joel@bar.com"]
    },
    ... other entries
    ]

It's very simple. Very basic. There is no trick in there: it's standard utf8, well formed, no missing value.

You want to print people details in alphabetical order this way:

    Joël (32) - 02/01/03:
    - joel@foo.com
    - joel@bar.com
    Kévin (23) - 03/06/05:
    - kevin@foo.com
    - kevin@bar.com
    ... other entries
This is a 1rst year of college exercise. Nothing remotely complicated. I'm not choosing some fancy machine learning or data processing stuff for which Python has magic libs. Every language can do that easily.

In Python 3.7, which is already 2 years old, the code would be:

    import json
    import datetime as dt

    with open("agenda.json") as fd:

        agenda = sorted(json.load(fd), key=lambda people: people["name"])

        for people in agenda:

            hired = dt.datetime.fromisoformat(people["hired"])
            print(f'{people["name"]} ({people["age"]}) - {hired:%d/%m/%y}:')

            for email in people["emails"]:
                print(f" - {email}")
The entire script is there. There is no trick. This is not a code golf version of of it; I could make it shorter. It really is standard Python. There is no 3rd party lib either.

It's not specific to Python, you would get this expressiveness with Ruby or Perl.

I don't see in which world you would get that in regular, honest to god, day to day, portable C++.

You have to declare types, many includes, you'll have headers and a main function. You have the memory and references to manage.

It doesn't make C++ a bad language.

It doesn't make python a better language.

The C++ version will take way less RAM than the Python version for example.

It's just the nature of those languages implies that.

For fun, I whipped this up in Rust. I decided to go with an explicit struct to serialize it into, because it makes the error handling easier, and is a bit more idiomatic. I kept unwrap because it's similar to the python semantic of throwing an exception. It's pretty close though!

    use chrono::NaiveDateTime;
    use serde::*;
    use serde_json;
    
    #[derive(Deserialize)]
    struct Person {
        name: String,
        age: u32,
        hired: String,
        emails: Vec<String>,
    }
    
    fn main() {
        let data = std::fs::read_to_string("agenda.json").unwrap();
        let mut people: Vec<Person> = serde_json::from_str(data).unwrap();
    
        people.sort_by(|a, b| b.name.cmp(&a.name));
    
        for person in &people {
            let datetime = NaiveDateTime::parse_from_str(&person.hired, "%Y-%m-%d %H:%M:%S").unwrap();
            println!(
                "{} ({}) - {}",
                person.name,
                person.age,
                datetime.format("%d/%m/%y")
            );
    
            for email in &person.emails {
                println!(" - {}", email);
            }
        }
    }
Apparently you can deserialize the date directly with serde and chrono too! So this could be even shorter.
And Rust is very, very expressive and concise for a bare metal language. Serde is a beautiful thing.
It's a bit longer in C++ but frankly not by that much :

    #include <iostream>
    #include <sstream>
    #include <fstream>
    #include <iomanip>
    #include <nlohmann/json.hpp>
    #include <range/v3/action/sort.hpp>
    
    int main()
    {
      using namespace nlohmann;
      using namespace ranges;
    
      const json parsed = json::parse(std::ifstream("/tmp/json/test.json"));
    
      std::vector agenda(parsed.begin(), parsed.end());
      sort(agenda, {}, [] (const auto& j) { return j["name"]; });

      for(const auto& people : agenda) try {
        std::tm t{};
        std::istringstream(people["hired"].get<std::string>()) >> std::get_time(&t, "%Y-%m-%d %H:%M:%S");
    
        std::cout << people["name"] << " (" << people["age"] << ") - " << std::put_time(&t, "%d/%m/%y") << ": \n";
        for(const auto& email : people["emails"])
          std::cout << " - " << email << "\n";
      } catch (...) { }
    }
This is getting ridiculous.

We went from:

> many Python projects have way more lines of code than equivalent C++ projects

To:

> It's a bit longer in C++ but frankly not by that much

With the proof being literally __twice__ more characters, with one line being more than a 100 characters long.

I understand that C++ doesn't have a native JSON lib, and sorting is a bit out there, but giving 3rd party lib access to Python feels like cheating:

    import pandas as pd
    agenda = pd.read_json("agenda.json", convert_dates=["hired"])
    for _, (name, age, hired, emails) in agenda.sort_values("name").iterrows():
        print(f"{name} ({age}) - {hired:%d/%m/%y}:")
        for email in emails:
            print(f" - {email}")
I mean, I can also create a lib that does the entire script, install it, and just do 'import answernh; answerhn.print_json("agenda.json")'

Now again, this is not so say "Python is better than C++". That's not the point.

If you want to write a game engine, C++ makes sense, verbosity doesn't matter, and you don't want the Python GC to kick in. If you want to iterate on your Saas product API quickly, it's probably not the best choice.

Why pretend otherwise? What the point of stating the sky is red?

It also requires a library outside of the standard.
The original post was about 'projects', not code snippets.

If you've done any serious (large, long, multi-team) projects in a dynamically-typed language, you know how quickly they turn into a big ball of mud where you're spending more time refactoring your refactorings than writing useful code.