Hacker News new | ask | show | jobs
by the__alchemist 598 days ago
This is a good question, and I think about it as well. My best guess for a simple explanation: Python is very popular; it makes sense to improve performance for python users, given many do not wish to learn to use a more performant language, or to use a more performant Python implementation. Becoming proficient in a range of tools so you can use the right one for the right job is high enough friction that it is not the path chosen by many.
2 comments

You should really add that Python is also a very good tool for people who know more performant languages. I think one of the sides which often gets forgotten is that a lot of software will never actually need to be very performant and often you’re not going to know the bottlenecks beforehand. If you even get to the bottlenecks it means you’ve succeeded enough to get to the bottlenecks. Somewhere you might not have gotten if you over engineered things before you needed it.

What makes Python brilliant is that it’s easy to deliver on business needs. It’s easy to include people who aren’t actually software engineers but can write Python to do their stuff. It’s easy to make that Wild West code sane. Most importantly, however, it’s extremely easy to replace parts of your Python code with something like C (or Zig).

So even if you know performant languages, you can still use Python for most things and then as glue for heavy computation.

Now I may have made it sound like I think Python is brilliant so I’d like to add that I actually think it’s absolute trash. Loveable trash.

> it’s extremely easy to replace parts of your Python code with something like C

I tend to use C++, so use SWIG [1] to make python code to interface with C++ (or C). You can nearly just give it a header file, and a python class pops out, with native types and interfaces. It's really magical.

[1] https://www.swig.org

I do think not being able to use 32 cores easily is a gap in the current language. In 2017 I rewrote a fairly high performance python dialog to Kafka daemon in Go. The python version require a lot of specialized knowledge to write, using gevent, PyPy, hand optimizing our framework for the hot paths etc, and still was only using a few cores to do work.

The Go was a dead simple my first Go project sort of implementation and used 32 cores and therefore worked much better right out of the gate. (I mean I did have go routine worker pools for each step of the processing, but the division of work into the stages was already in the Python code).

So yeah Python is easy to make less of a mess of for lots of people, until you want to use all your cores (which again means rerunning your things goes from four minutes to 15 seconds on that fancy laptop).

Oh yeah, I totally get the motivation behind it. It's always very tempting to want to make things faster. But I can't help but wondering if these attempts to make it faster might end up just making it worse.

On the other hand though, Python is so big and there's so many corps using it with so much cash that maybe they can get away with just breaking shit every few releases and people will just go adapt packages to the changes.

I think I am much happier to redo my personal scripts to stay at bleeding edge than to rewrite old code at work to stay in supported versions. For corporations wanton breaking changes means code that is working and has some non-zero risk to change has to have budget spent on it just to stay in place.

Just this week I failed to convince a team to migrate their ten year old maintenance mode demon to Python 3.

Python famously has a community that does NOT adapt to changes well. See the Python 2 to 3 transition.
That was, in many ways, a crazy difficult transition. I don't think most languages have gone through such a thing. Perl tried and died. So I don't agree that it reflects poorly on the community; I think the plan itself was too ambitious.
Many languages have. There were significant breaks in C++ when stringabi changed, Swift has had major language changes, rust has editions.

The difference is in what motivates getting to the other end of that transition bump and how big the bump is. That’s why it took till past 2.7’s EOL to actually get people on to 3 in a big way because they’d drag their feet if they don’t see a big enough change.

Compiled languages have it easier because they don’t need to mix source between dependencies, they just have to be ABI compatible.

Python's community was significantly smaller and less flushed with cash during the 2 to 3 transition. Since then there has been numerous 3.x releases that were breaking and people seem to have been sucking it up and dealing with it quietly so far.

The main thing is that unlike the 2 to 3 transition, they're not breaking syntax (for the most part?), which everyone experiences and has an opinion on, they're breaking rather deep down things that for the most part only the big packages rely on so most users don't experience it much at all.

I disagree with this entire comment.

The Python community consisted of tons of developers including very wealthy companies. At what point in the last few years would you even say they became “rich enough” to do the migration? Because people are STILL talking about trying to fork 2.7 into a 2.8.

I also disagree with your assertion that 3.x releases have significant breaking changes. Could you point to any specific major breaking changes between 3.x releases?

2 to 3 didn’t break syntax for most code either. It largely cleaned house on sensible API defaults.

Could you point to any specific major breaking changes between 3.x releases?

I can not, but I can tell you that anything AI often requires finding a proper combination of python + cuXXX + some library. And while I understand cu-implications, for some reason python version is also in this formula.

I literally have four python versions installed and removed from PATH, because if I delete 3.9-3.11, they will be needed next day again and there’s no meaningful default.

They'll all co-exist. Add them all to your PATH, make one the default python3, and request specific versions when they are required.
Those are ABI changes and not changes to the language.
2 to 3 broke lots of code. Print became a function. Imports moved around. And there were subtle changes in the semantics of some things. Famously, strings changed, and that definitely affected a lot of packages.

Quite a bit of that could be fixed by automated tooling, but not all of it, and the testing burden was huge, which meant a lot of smaller packages did not convert very quickly and there were ripple effects.

Yes 2 to 3 changed things. We’re discussing what changed in between different versions of 3.
Fair enough. You may be totally right here, as I mentioned I don't use Python much at all since like 2017 and haven't paid it much attention in a while. I retract my comment.

Regarding breakage in 3.x, all I know is that I recall several times where I did a linux system update (rolling release), and that updated my Python to a newly released version which broke various things in my system. I'm pretty sure one of these was v3.10, but I forget which others caused me problems which I could only solve by pinning Python to an older release.

It's entirely possible though that no actual APIs were broken and that this was just accidentaly bugs in the release, or the packages were being naughty and relying on internals they shouldn't have relied on or something else.

To your last point: it’s neither the language nor the packages but rather it’s the ABI.

Python isn’t fully ABI stable (though it’s improved greatly) so you can’t just intermix compiled dependencies between different versions of Python.

This is true for many packages in your distro as well.

My company would pay for 2.8. And a lot of internal teams that used Python 2 now use go for that. Not ML but devops/infra orchestration.

Python 3 made no sense from a cost/risk perspective for teams with a lot working and mostly finished Python 2 code.

Most of those are some old long deprecated things and in general those are all straight up improvements. Python is not my main thing so I'm not really good to answer this, but I listed a few that I am sure triggered errors in some code bases (I'm not saying they are all major). Python's philosophy makes most of those pretty easy to handle, for example instead of foo now you have to be explicit and choose either foo_bar or foo_baz. For example in C there still is a completely bonkers function 'gets' which is deprecated for a long time and it will be there probably for a long time as well. C standard library, Windows C API and Linux C API to large extent are add only, because things should stay bug-to-bug compatible. Python is not like that. This has its perks, but it may cause your old Python code to just not run. It may be easy to modify, but easy is significantly harder than nothing at all.

https://docs.python.org/3/whatsnew/3.3.html#porting-to-pytho...

> Hash randomization is enabled by default. Set the PYTHONHASHSEED environment variable to 0 to disable hash randomization. See also the object.__hash__() method.

https://docs.python.org/3/whatsnew/3.4.html#porting-to-pytho...

> The deprecated urllib.request.Request getter and setter methods add_data, has_data, get_data, get_type, get_host, get_selector, set_proxy, get_origin_req_host, and is_unverifiable have been removed (use direct attribute access instead).

https://docs.python.org/3/whatsnew/3.5.html#porting-to-pytho...

https://docs.python.org/3/whatsnew/3.6.html#removed

> All optional arguments of the dump(), dumps(), load() and loads() functions and JSONEncoder and JSONDecoder class constructors in the json module are now keyword-only. (Contributed by Serhiy Storchaka in bpo-18726.)

https://docs.python.org/3/whatsnew/3.7.html#api-and-feature-...

> Removed support of the exclude argument in tarfile.TarFile.add(). It was deprecated in Python 2.7 and 3.2. Use the filter argument instead.

https://docs.python.org/3/whatsnew/3.8.html#api-and-feature-...

> The function time.clock() has been removed, after having been deprecated since Python 3.3: use time.perf_counter() or time.process_time() instead, depending on your requirements, to have well-defined behavior. (Contributed by Matthias Bussonnier in bpo-36895.)

https://docs.python.org/3/whatsnew/3.9.html#removed

> array.array: tostring() and fromstring() methods have been removed. They were aliases to tobytes() and frombytes(), deprecated since Python 3.2. (Contributed by Victor Stinner in bpo-38916.)

> Methods getchildren() and getiterator() of classes ElementTree and Element in the ElementTree module have been removed. They were deprecated in Python 3.2. Use iter(x) or list(x) instead of x.getchildren() and x.iter() or list(x.iter()) instead of x.getiterator(). (Contributed by Serhiy Storchaka in bpo-36543.)

> The encoding parameter of json.loads() has been removed. As of Python 3.1, it was deprecated and ignored; using it has emitted a DeprecationWarning since Python 3.8. (Contributed by Inada Naoki in bpo-39377)

> The asyncio.Task.current_task() and asyncio.Task.all_tasks() have been removed. They were deprecated since Python 3.7 and you can use asyncio.current_task() and asyncio.all_tasks() instead. (Contributed by Rémi Lapeyre in bpo-40967)

> The unescape() method in the html.parser.HTMLParser class has been removed (it was deprecated since Python 3.4). html.unescape() should be used for converting character references to the corresponding unicode characters.

https://docs.python.org/3/whatsnew/3.10.html#removed

https://docs.python.org/3/whatsnew/3.11.html#removed

https://docs.python.org/3/whatsnew/3.12.html#removed

Thanks. That’s a good list, though I think the majority of the changes were from deprecations early in the 3.x days and are API changes, whereas the OP was talking about syntax changes for the most part.
Are there communities that handle such a change well? At least that went better than Perl and Raku
Anything where the language frontend isn’t tied to the ABI compatibility of the artifacts I think. They can mix versions/editions without worry.

I think it’s a larger problem with interpreted languages where all the source has to be in a single version. In that case I cant think of much.