Hacker News new | ask | show | jobs
by bb88 831 days ago
Working with threads is a pain regardless of which language you use.

Some might say: "Use Go!" Alas: https://songlh.github.io/paper/go-study.pdf

After a couple decades of coding, I can say that threading is better if it's tightly controlled, limited to usages of tight parallelism of an algorithm.

Where it doesn't work is in a generic worker pool where you need to put mutex locks around everything -- and then prod randomly deadlocks in ways the developer boxes can't recreate.

5 comments

> After a couple decades of coding, I can say that threading is better if it's tightly controlled, limited to usages of tight parallelism of an algorithm.

This may be a case of violent agreement, but there are a few clear cases where multithreading is easily viable. The best case is some sort of parallel-for construct, even if you include parallel reductions, although there may need to be some smarts around how to do the reduction (e.g., different methods for reduce-within-thread versus reduce-across-thread). You can extend this to heterogeneous parallel computations, a general, structured fork-join form of concurrency. But in both cases, you essentially have to forbid inter-thread communication between the fork and the join parameters. There's another case you might be able to make work, where you have a thread act as an internal server that runs all requests to completion before attempting to take on more work.

What the paper you link to is pointing out, in short, is that message passing doesn't necessarily free you from the burden of shared-mutable-state-is-bad concurrency. The underlying problem is largely that communication between different threads (or even tasks within a thread) can only safely occur at a limited number of safe slots, and any communication outside of that is risky, be it an atomic RMW access, a mutex lock, or waiting on a message in a channel.

> Working with threads is a pain regardless of which language you use.

That's not true at all. F#, Elixir, Erlang, LabVIEW, and several other languages make it very easy. Python makes it incredibly tough.

> Python makes it incredibly tough.

I disagree, Python makes it incredibly easy to work with threads in many different ways. It just doesn't make threads faster.

In what way? Threading, asyncio, tasks, event loops, multiprocessing, etc. are all complicated and interact poorly if at all. In other languages, these are effectively the same thing, lighter weight, and actually use multicore.

If I launch 50 threads with run away while loops in Python, it takes minutes to laumch and barely works after. I can run hundreds of thousands and even millions of runaway processes in Elixir/Erlang that launch very fast and processes keep chugging along just fine.

> If I launch 50 threads with run away while loops in Python, it takes minutes to laumch and barely works after. I can run hundreds of thousands and even millions of runaway processes in Elixir/Erlang that launch very fast and processes keep chugging along just fine.

I'm not sure that argument helps your position on threading. I once saw a java program spin off 3000 threads doing god knows what. Debugging the fucking thing was impossible.

The point there is that processes in Elixir and Erlang are effectively like functions, in that you do not need to "manage" them in any sort of way. They are automatically distributed across all cores, pre-emptively scheduled, killable, have a built-in inbox, etc. One doesn't need to worry about what concurrency library to use nor manually create mailboxes using queues or whatever else. It just works, and you fire them off to do whatever you need. So there is no ceremony. Threads in many other languages and in Python in particular, require a huge amount of ceremony and management.
> require a huge amount of ceremony and management

I think Java made it quite easy to spin off threads, and again, it doesn't help the argument. It just made the f'ing thing worse. Race conditions are still f'ing hard to solve. Particularly when a shared-mutable-state exists outside of the program.

The whole purpose of threads is to improve overall speed of execution. Unless you're working with a very small number of threads (single digits), that's a very hard to achieve goal in Python. I wouldn't count this as easy to use. It's easy to program, yes, but not easy to get working with reasonably acceptable performance.
And the python people would just point to multiprocessing...which works pretty well.
Which has its own set of challenges and yet another implementation of queue.
Yes, but the shared-mutable-state issue goes away.
It's not such a big pain in every language. And certainly not as hard to get working with acceptable performance in many languages.

Even if you have zero shared resources, zero mutexes, no communication whatsoever between threads, it's a huge pain in Python if you need +10-ish threads going. And many times the GIL is the bottleneck.

This is where Python's GIL bit me: I was more than familiar with how to shoot myself in the foot using threads in other languages, and careful to avoid those traps. Threads spun up only in situations where they had their own work to do and well-defined conditions for how both failure and success would be reported back to the thread that requested it, along with a pool that wouldn't exceed available resources.

Like every other language I've used this approach with, nothing bad happened - the program ran as expected and produced correct results. Unlike every other language, spreading calculations across multiple cores didn't appreciably improve performance. In some cases, it got slower.

Eventually scrapped it all, and went with an approach closer to what I'd have done with C and fork() decades ago... Which, to Python's credit, was fairly painless and worked well. But it caught me off-guard, because with asyncio for IO-bound stuff, it didn't seem like threads really have much of a purpose in Python, other than to be a tripwire for unwary and overconfident folks like myself!

Not disagreeing. The only case for threading in python is for spinning something to handle IO.

But now with async even that goes away.

Concurrency with rayon in Rust isn't pain, I'd say. It's basically hidden away from the user.