Hacker News new | ask | show | jobs
by bruxis 2272 days ago
Agreed, I think looking at the Python 2 and 3 migration catastrophe gives a glimpse into what this could look like, but I imagine it would be much worse given the types of (large) projects that are backed by substantially "dated" C/C++ code.
4 comments

You could do so much better than the python 2 to 3 transition.

If the python 3 interpreter could still run python 2 code. If you could mix and match python 2 and 3 code on a per-module, per-file or even per file basis, then the transition could have been so much smoother.

The reasons the transitions worked were that:

- the languages were not that different, so no need to learn the new version

- python core devs gave 10 years to make the transition. Then extended it.

- it was possible to write code running on python 2 and 3

- python is very expressive, hence the code base have way less numbers of lines than in C++

- python cared only about one implementation, Cpython. The rest of the world needed to follow. Some didn't, like Jython and stackless, and the community didn't blink.

Despite all that, the transition was very painful.

“Worked?” Nothing about the Python 2 to 3 transition “worked” and people are still aghast that’s Python 2 is being EOL’d.
I've been coding in Python for 15 years, and from what I see in the numerous places I get to work, yes, it worked.

It's just that aghast people are the most vocal. You don't hear the 90% of people that are happy. They don't take the time to speak up. But the unhappy complain all the time.

I’m not speaking about developers complaining, I’m speaking about the fact that it’s still a fractured ecosystem and plenty of popular programs, packages, and scripts still in use or even still actively developed rely on or integrate with Python 2 (exclusively).
> I’m not speaking about developers complaining, I’m speaking about the fact that it’s still a fractured ecosystem and plenty of popular programs, packages, and scripts still in use or even still actively developed rely on or integrate with Python 2 (exclusively).

Such as? What actively developed tools or ecosystems rely or integrate with python2 exclusively?

There are people still using Windows XP.

There are programs still using COBOL.

That's life in IT.

Yeah I'm one of the happy ones. I always loathed Python 2 and text handling, Python 3 was a huge improvement from my perspective.
I'm several years into the "upgrade" and find myself still swearing daily at the idiocy of the whole thing. Hundreds of scripts used maybe once a year, such as 'dups.py' I tried to run today, broken by a missing parenthesis, and a function moved around.

Utterly pointless and reputationally ruinous. I don't do serious work in Python any more

That has...nothing to do with python3 though, that's just your broken code being broken.
I recognize it may be difficult to understand, but this is a thread about backwards compatibility. Of course if I had any faith in the contemporary Python community, I would still be treating Python as a serious programming language and not be suggesting this is a difficult concept for someone to grasp.
I'm not sure what you're getting at. Perhaps if you could translate to less-smug, the rest of us could understand.
print vs print() is what I think the parent comment was referring to.
The Unicode issue is one of the most painful sticking points. Handle legacy filesystems with the chance of filesystems that have previously valid (non / non \0 containing filenames) files in various encoding soup nightmares? Python3 is NOT the the tool for that job! Which means you can't write any systems tool stuff in Python3 because there's a good chance it'll blow up unexpectedly, or that you suddenly have to handle all sorts of things that in any other language you can just GIGO and move on.
Python 3 promotes the most likely scenario: you are on a modern system with normal looking filenames.

It would be unwise to design an API that promotes a niche need like dealing with a legacy file system with corrupted file names. This is what Python 3 fixed, making the easy things easy, and the complicated things possible, not the other way around.

But Python 3 is absolutely up to the task of handling legacy file systems with random encoding mixed in, you just need to tell it explicitly you are doing so.

Let's create a file with a completely garbage name, made of random bytes, which is allowed on Unix:

    >>> import os, sys
    >>> sys.version_info          
    sys.version_info(major=3, minor=7, micro=5, releaselevel='final', serial=0)
    >>> with open(os.urandom(32), 'wb') as f: 
    ...     f.write(os.urandom(200)) 
If you pass bytes to any file system function, it will return file names as bytes:

    >>> filename = os.listdir(b'.')[0]
    >>> type(filename)
    <class 'bytes'>
    >>> filename[:10]   
    b'\xf8-U\xa5\x1dq\xad?\xbf\xa2'
And you can just open that:

    >>> data = open(filename, 'rb').read() 
    >>> data[:10]
    b'E\x05\xce*M \xf5\xfeK\x18'
    >>> type(data)
    <class 'bytes'>
You do exactly the same as what you did with Python 2, and treat the files as raw bytes entirely, without thinking about the content.

Some API in Python require text. If you want to pass the file names to those API, you can use surrogate escape, which let you convert back and forth between arbitrary bytes and utf8 text, without loosing information:

    >>> as_text = filename.decode('utf8', errors='surrogateescape')
    >>> as_text[:10]
    '\udcf8-U\udca5\x1dq\udcad?\udcbf\udca2'
    >>> type(as_text)
    <class 'str'>
    >>> as_text.encode('utf8',  errors='surrogateescape') == filename
    True
This is a good thing, it forces the dev to be explicit about the places in your code where you are dealing with a specific scenario. It also makes you pay the price of doing so opt in, not opt out.

If you have to do a robust version of this with Python 2, you will have to do that anyway: at some point mixed encoding will bite you if you don't have a neutral representation for them. Python 2 gave you the illusion of robustness, because it said "yes" to most operations.

I remember quite well that a lot of Python 2 programs didn't work in Europe because your user directory would contain your name, which could be non ascii. Python 2 programs are opt in to deal with it. It's the opposite philosophy, and caused so many crashes.

I failed to mention that it's only useful to do this manually if you need to decode path with mixed encodings, or share said path with the rest of the world.

If all you need is to work with arbitrary mixed bags of file names, you can just pass "str", and Python will use automatically and transparently surrogateescape everywhere.

I think that the main complaint is that the early versions of python 3 had deep issues with this and it took years to fix them.
The first versions of python 3 were indeed, not suitable for serious work. Python 3 started to be usable from 3.3, interesting from 3.4, comfortable from 3.5, and objectively way better than 2.7 from 3.6.

It's not a surprise, as most softwares need iterations to get get good. Python 2.7 has not started as the amazing tool it is now, and as I started my career with 2.4, you remember some funny stuff.

This is why Python 2.7 was kept around for 13 years after Python 3 first came out.

Now, 3.6 came out in 2016. It solves many issues 2.7 had, and add tons of goodies. It's very ergonomic, can be installed easily. It's a great software.

It's 2020, let's enjoy the goodness of Python 3.

for a long time it was impossible to write code compatible with both python 2 and 3
It started to be possible with Python 3.3, which came out 8 years ago.

We are at Python 3.8.

Python 3 has been around for 13 years, during which Python 2.7 was still supported.

> python is very expressive, hence the code base have way less numbers of lines than in C++

This doesn't match reality. In reality, many Python projects have way more lines of code than equivalent C++ projects. Probably because of the 'expressiveness' you cite; you can't really showcase your love of coding and job security though artificial complexity without 'expressive' bells and whistles. (This is the idea that lead to languages like Go, I'm pretty sure.)

That said, C++ is plenty 'expressive' itself.

> many Python projects have way more lines of code than equivalent C++ projects.

That is mathematically impossible. Even if the syntaxes were exactly the same (which they are not, python syntax is on average shorter), the low level nature of C++ requires your code to do operations that Python does need to do, such as memory management.

It's like stating the sky is red.

Once you've written sufficient unit tests to prove your code works as well as a C++ equivalent does just by compiling, I don't think Python ends up being more dense though.

I'd also argue that operator overloading etc lets C++ be just as expressive as Python, the libraries just need to be designed with that in mind.

> Once you've written sufficient unit tests to prove your code works as well as a C++ equivalent does just by compiling

You mean in the same way C++ dev write tests to prove that all their code has no memory error which you get in Python for free ?

Except:

- tests are way shorter to write in python than in C++

- C++ devs often write zero tests for their code, just like python devs

- python duck typing + REPL means compiler checks are rarely necessary

- if you need to be type checks, you use type hints in python, which even then is still less verbose than c++

> I'd also argue that operator overloading etc lets C++ be just as expressive as Python, the libraries just need to be designed with that in mind.

Ok, let's say you have this json:

    [{
        "name": "Kévin",
        "age": 23,
        "hired": "2005-06-03 02:12:33",
        "emails": ["kevin@foo.com", "kevin@bar.com"]
    }, {
    }, {
        "name": "Joël",
        "age": 32,
        "hired": "2003-01-02 12:32:11",
        "emails": ["joel@foo.com", "joel@bar.com"]
    },
    ... other entries
    ]

It's very simple. Very basic. There is no trick in there: it's standard utf8, well formed, no missing value.

You want to print people details in alphabetical order this way:

    Joël (32) - 02/01/03:
    - joel@foo.com
    - joel@bar.com
    Kévin (23) - 03/06/05:
    - kevin@foo.com
    - kevin@bar.com
    ... other entries
This is a 1rst year of college exercise. Nothing remotely complicated. I'm not choosing some fancy machine learning or data processing stuff for which Python has magic libs. Every language can do that easily.

In Python 3.7, which is already 2 years old, the code would be:

    import json
    import datetime as dt

    with open("agenda.json") as fd:

        agenda = sorted(json.load(fd), key=lambda people: people["name"])

        for people in agenda:

            hired = dt.datetime.fromisoformat(people["hired"])
            print(f'{people["name"]} ({people["age"]}) - {hired:%d/%m/%y}:')

            for email in people["emails"]:
                print(f" - {email}")
The entire script is there. There is no trick. This is not a code golf version of of it; I could make it shorter. It really is standard Python. There is no 3rd party lib either.

It's not specific to Python, you would get this expressiveness with Ruby or Perl.

I don't see in which world you would get that in regular, honest to god, day to day, portable C++.

You have to declare types, many includes, you'll have headers and a main function. You have the memory and references to manage.

It doesn't make C++ a bad language.

It doesn't make python a better language.

The C++ version will take way less RAM than the Python version for example.

It's just the nature of those languages implies that.

For fun, I whipped this up in Rust. I decided to go with an explicit struct to serialize it into, because it makes the error handling easier, and is a bit more idiomatic. I kept unwrap because it's similar to the python semantic of throwing an exception. It's pretty close though!

    use chrono::NaiveDateTime;
    use serde::*;
    use serde_json;
    
    #[derive(Deserialize)]
    struct Person {
        name: String,
        age: u32,
        hired: String,
        emails: Vec<String>,
    }
    
    fn main() {
        let data = std::fs::read_to_string("agenda.json").unwrap();
        let mut people: Vec<Person> = serde_json::from_str(data).unwrap();
    
        people.sort_by(|a, b| b.name.cmp(&a.name));
    
        for person in &people {
            let datetime = NaiveDateTime::parse_from_str(&person.hired, "%Y-%m-%d %H:%M:%S").unwrap();
            println!(
                "{} ({}) - {}",
                person.name,
                person.age,
                datetime.format("%d/%m/%y")
            );
    
            for email in &person.emails {
                println!(" - {}", email);
            }
        }
    }
It's a bit longer in C++ but frankly not by that much :

    #include <iostream>
    #include <sstream>
    #include <fstream>
    #include <iomanip>
    #include <nlohmann/json.hpp>
    #include <range/v3/action/sort.hpp>
    
    int main()
    {
      using namespace nlohmann;
      using namespace ranges;
    
      const json parsed = json::parse(std::ifstream("/tmp/json/test.json"));
    
      std::vector agenda(parsed.begin(), parsed.end());
      sort(agenda, {}, [] (const auto& j) { return j["name"]; });

      for(const auto& people : agenda) try {
        std::tm t{};
        std::istringstream(people["hired"].get<std::string>()) >> std::get_time(&t, "%Y-%m-%d %H:%M:%S");
    
        std::cout << people["name"] << " (" << people["age"] << ") - " << std::put_time(&t, "%d/%m/%y") << ": \n";
        for(const auto& email : people["emails"])
          std::cout << " - " << email << "\n";
      } catch (...) { }
    }
The original post was about 'projects', not code snippets.

If you've done any serious (large, long, multi-team) projects in a dynamically-typed language, you know how quickly they turn into a big ball of mud where you're spending more time refactoring your refactorings than writing useful code.

I agree about the "rocky transition" from Python 2 to 3. Still if the Python migration story gives any inference towards the outcome of an eventual C++ transition, the new language would be an even more massive success; it seems that Python 3.x is doing alright right?
That is somewhere in the logical region of saying 'Stephen Hawking was really smart, maybe amyotrophic lateral sclerosis is a good thing'. The transition to Python 3 was not pretty. Still isn't; Python 2 documentation features prominently when I'm trying to look up information.

The to abandon backwards compatibility in C++ is to make a mockery even of the name of the language (see the "C" in there). If they want to create a new language they should call it something different. Willfully abandoning backwards compatibility and keeping the name is an abuse of one of the great brands in software.

I was just trying to get a response to how the Python transition mentioned was so (edit:) "catastrophic". I don't understand the need to bring fiction writers, fatal diseases or laden words like "mockery" or "abuse" to the debate.
Stephen Hawking was not a fiction writer. Stephen King does not have a horrible disease.
sorry, my bad
> Python 2 documentation features prominently when I'm trying to look up information.

I consider this more a case of the ossification of search.

Neither search engines nor specific sites like Stack Overflow deal with the fact that information can switch from right to wrong with age.

Yeah, I think Python 2 has been successfully killed at this point. There are a decent number of projects which still need to migrate (and I recently became responsible for a couple) but it’s been a long time since I heard anyone claim they can keep Python 2 forever. The maintenance burden of keeping Python 2 is rapidly increasing as libraires drop support.
Since python3 has many more users than python2 this is not a relevant comparison. It was not smooth, but now it’s done and it worked. There have been many C++ forks which have failed to replace it, e.g D, Rust, ...
D and Rust are not forks of C++, they're entirely different languages.