Hacker News new | ask | show | jobs
by CJefferson 620 days ago
Good to get advanced notice, if I read all the way down, that they will silently completely change the behavior of multiprocessing in 3.14 (only on Unix/Linux, in case other people wonder what’s going on), which is going to break a bunch of programs I work with.

I really like using Python, but I can’t keep using it when they just keep breaking things like this. Most people don’t read all the release notes.

4 comments

Not defending their specific course of action here, but you should probably try to wade into the linked discussion (https://github.com/python/cpython/issues/84559). Looks like the push to disable warnings (in 3.13) is mostly coming from one guy.
I think should have a dig.

While it’s not perfect, I know a few other people people who do “set up lots of data structures, including in libraries, then make use of the fact multiprocessing uses fork to duplicate them”. While fork always has sharp edges, it’s also long been clearly documented that’s the behavior on Linux.

I'm pretty sure that significantly more people were burned by fork being the default with no actual benefit to their code, whether because of the deadlocks etc that it triggers in multithreaded non-fork-aware code, or because their code wouldn't work correctly on other platform. Keeping it there as an option that one can explicitly enable for those few cases where it's actually useful and with full understanding of consequences is surely the better choice for something as high-level as Python.
I agree that fork was an awful default.

However, changing the default silently just means people's code is going to change behaviour between versions, or silently break if someone with an older version runs their code. At this point, it's probably better to just require people give an explicit choice (they can even make one of the choice names be 'default' or something, to make life easy for people who don't really care).

I'm with you on undesirability of silent change of behavior. But requiring people to make an explicit choice would immediately break a lot more code, because now all the (far more numerous) instances of code that genuinely doesn't care one way or another won't run at all without changes - and note that for packages, this also breaks anyone depending on them, requiring a fix that is not even in their code. So it's downsides either way, and which one is more disruptive to the ecosystem depends on the proportion of code affected in different ways. I assume that they did look at existing Python code out in the wild to get at least an eyeball estimate of that when making the decision.
> posix_spawn() now accepts None for the env argument, which makes the newly spawned process use the current process environment

That is the thing about fork(), spawn(), and even system() being essential wrappers around clone() in glibc and musl.

You can duplicate the behavior of fork() without making the default painful for everyone else.

In musl systems() calls posix_spawn() which calls clone().

All that changes is replacing a legacy call fork() that is nothing more than a legacy convenience alias with real issues and foot guns with multiple threads.

You are complaining about spawn()?

both fork() and spawn() are just wrappers around clone() on most libc types anyway.

spawn() was introduced to POSIX in the last century to address some of the problems with fork() especially related to multi threading, so I an curious how your code is so dependent on UTM, yet multi threading.

My code isn't dependant on multi-threading at all.

It use fork in Python multiprocess, because many packages can't be "pickled" (the standard way of copying data structures between processes), so instead my code looks like:

* Set up big complicated data-structures.

* Use fork to make a bunch of copies of my running program, and all my datastructures

* Use multiprocessing to make all those python programs talk to each other and share work, thereby using all my CPU cores.

'Threading' is an overload term. And while I didn't know, I was wondering if at the library level, the fact that posix_spawn() pauses the parent, while fork() doesn't, that is what you were leveraging.

The python multiprocessing module has been problematic for a while, as the platform abstractions are leaky and to be honest the POSIX version of spawn() was poorly implemented and mostly copied the limits of Windows.

I am sure that some of the recent deadlocks are due to this pull request as an example that calls out how risky this is.

https://github.com/python/cpython/pull/114279

Personally knowing the pain of fork() in the way you are using it, I have moved on.

But I would strongly encourage you to look into how clone() and the CLONE_VM and CLONE_VFORK options interact, document your use case and file an actionable issue against the multiprocessing module.

Go moved away from fork in 1.9 which may explain the issues with it better than the previous linked python discussion.

But looking at the git blame, all the 'fixes' have been about people trading known problems and focusing on the happy path.

My reply was intended for someone to address that tech debt and move forward with an intentional designed refactoring.

As I just focus on modern Linux, I avoid the internal submodule and just call clone() in a custom module or use python as glue to languages that have better concurrency.

I found where subprocess moved to posix_spawn() that may help.

https://bugs.python.org/issue35537

My guess is that threads in Cython are an end goal. While setting execution context will get you past this release, fork() has to be removed if the core interpreter is threaded.

The delta between threads and fork/exec has narrowed.

While I don't know if that is even an option for you, I am not seeing any real credible use cases documented to ensure that model is supported.

Note, I fully admit this is my own limits of imagination. I am 100% sure there are valid reasons to use fork() styles.

Someone just needs to document them and convince someone to refactor the module.

But as it is not compatible with threads, has a ton of undefined behavior and security issues, fork() will be removed without credible documented use cases that people can weigh when considering the tradeoffs.

Unfortunately, they learned the wrong lesson from the 2->3 transition. Break things constantly instead of all at once. :p

Still, this one doesn’t seem too bad. Add method=FORK now and forget about it.

> I really like using Python, but I can’t keep using it when they just keep breaking things like this.

So much perl clutching. Just curious, since I guess you've made up your mind, what's your plan to migrate away? Or are you hoping maintainers see your comment and reconsider the road-map?

Rust. I've already re-written several of my decent sized research programs to Rust, and plan to finish converting what's left soon.

Rust isn't perfect (no language is), but they do seem to try much harder to not break backwards compatability.

Not the person you are responding to, but my Python 3 migration plan was to move to Go for all new projects.
I'm sure your departure from the community will be the tectonic shift that'll finally get the PSF to change their course.