Hacker News new | ask | show | jobs
by Gh0stRAT 968 days ago
Humans are bad at reasoning about multiple threads simultaneously, so I suspect the more practical shift is the trend we've already been seeing toward more declarative syntax.

eg `for` loops are being replaced by `foreach` loops,`map` and `filter` operations, etc. These tell the compiler/interpreter that you want to do some operation to all the items in your datastructure, leaving it up to the compiler/runtime whether and how to parallelize the work for you.

13 comments

I would upvote this 100 times if I could.

I've thought this way ever since MacOS added the Grand Central Dispatch [1]. Of course, I thought industry would follow quickly and that tooling would coalesce around this concept pretty quickly. Seems the industry wants to take its sweet time.

[1] - https://en.wikipedia.org/wiki/Grand_Central_Dispatch

I mean, OpenMP dates back to 1997 (1998 for C and C++). Apple, however, has never supported it for what can only be selfish reasons (particularly since Clang has a quite good implementation provided by Intel, which can easily be installed on a Mac if you want). GCD came a decade later.

For basic parallelism, nothing beats OpenMP for ease of adapting existing code (often a single "#pragma omp parallel for" directive is enough). Even for more complex parallelism, particularly where per-thread resources need to be managed, OpenMP still provides a much simpler programming model than the alternatives.

OpenMP and GCD solve different problems. You'd not want to use GCD to parallelize the same tasks you're parallelizing with OpenMP in most cases. GCD is more suited for the one-off cases (toss this task into the queue, toss that task into the queue; or "as we get new items from the user toss the processing into the queue" but we don't know the rate of new items coming in so batching doesn't make as much sense), vice OpenMP which is targeting things like scientific computing/simulations where you know you have a million objects you want to perform a computation on. The GCD version of the same would be slower by a large measure if you spawned a task per work item or you'd recreate parts of OpenMP to divide the work across a smaller number of tasks. And you wouldn't want to use OpenMP for parallelizing the kind of things you toss into a work queue model like GCD offers.
Sure, OpenMP and GCD provide different interfaces around the same concept of a managed threadpool. Given both, one would use them for different tasks (in the same way one actually uses OpenMP and std::async for complementary purposes). But in the context of GP's basic parallelelized for/map/reduce operations, either can be used fine (although OpenMP would probably be more pleasant to write).
I'm not familiar with GCD but after reading the wiki page, I'd say most languages have something like that: a queue that you can add things to, items can then be processed by multiple threads of execution.

I'd also say that most languages have something similar to OpenMP, parallel for loops, etc. Great if you have some read only data in arrays and wish to process it.

However, in my opinion, it doesn't really matter how convenient a parallel / async programming model is to use as the real work is ensuring that there isn't any shared mutable state being updated in parallel. The other issue is, once you have formulated / re-formulated a particular problem to this model, ensuring that it remains this way is pretty challenging on larger teams. Someone can easily unknowingly commit something that breaks such assumptions.

At the end of the day, no matter what kind of code you are writing, you either have tools and processes in place that reduce the risk/mitigate the impacts of bugs, or there is always the risk of serious problems being introduced. An unknowing change that breaks parallelism in another component could just as well be an unknowing change that breaks authentication or defeats a security boundary.

Parallelism introduces an additional class of bugs, but they are fundamentally addressed the same way as any other class of bugs - e.g. testing, tools, and code review. If some_one_ can unknowingly break a system, that means the tools and processes weren't good enough.

I've played around with actors in Swift for shared mutable state, which enforces async access patterns.

https://www.swiftbysundell.com/articles/swift-actors/

It’s been a while but isn’t GCD an OS global queue rather than local to the process?
Back then it wasn't even sure its adoption would take off.

My HPC programming lectures were done on PVM, and I bet only grey beards know what it stands for.

> Seems the industry wants to take its sweet time.

We're inching towards an in vogue way to do what erlang had figured out in the 80's. We'll pick up the pace any day now. Surely.

Grand Central Dispatch is famous for breaking tons of old programs for the grave sin of trying to `fork` a child process.
>eg `for` loops are being replaced by `foreach` loops,`map` and `filter` operations, etc. These tell the compiler/interpreter that you want to do some operation to all the items in your datastructure, leaving it up to the compiler/runtime whether and how to parallelize the work for you.

There's difference between doing it in order 1, 2, 3 and 3, 1, 2.

foreach will not be replaced behind the scenes into multithreaded version since it changes behaviour.

for is replaced with foreach because usually you dont need index and foreach is just handier and safer, that's it.

.NET's std lib has Parallel.ForEach for such a thing.

We really don't need magic to write multithreaded code. All we need is just really, really well designed APIs and primitives.

>foreach will not be replaced behind the scenes into multithreaded version since it changes behaviour.

It only (meaningfully) changes behavior if you're both iterating over an odered datastructure and the body of your loop has direct or indirect side-effects. (like printing, writing to a file, making network requests, etc)

>nd the body of your loop has direct or indirect side-effects

So like... huge % of the real world code bases

Unfortunately yes. That being said, the hottest loops that would benefit the most from added parallelism tend to have fewer side effects already in my experience so things aren't quite so bleak.
> It only (meaningfully) changes behavior if you're both iterating over an odered datastructure and the body of your loop has direct or indirect side-effects.

Right, and not always even then, because that depends on what the consumer is concerned with as well. But the fact that it can means it's not a safe automatic substitution.

I agree to a large extent but I am referring more to our teaching of Computer Science. For our teaching of Software Engineering I think you're largely correct.

> Humans are bad at reasoning about multiple threads simultaneously

I am not so sure this is true, I do believe that people are poorly practiced. My experiences have led me to believe Universities silo explicit parallel programming too much. It's generally it's own non-compulsory subject in a Comp-Sci major.

C++ has had std::execution_policy for a long time now - you pass that with an algorithm like sort, for_each, etc. and it will choose a way to parallelize that for you.
I like the way you word this. Similar to the product I make, I describe my mind as an asynchronous queue. I can only reason about one thing at a time, but when I do that is fairly random.

How this has played out in my life gives me caution about making this standard in computing.

Considering the developments in data engineering land I wonder if we'll be describing our operations as a DAG rather than maps and folds specifically.
It is more like the mainstream world is finally catching up with the world promised by functional programming since Lisp and parallel computing exists.

Only now I can enjoy in modern hardware what I had to imagine when reading papers about Star Lisp and the Connection Machine, alongside other similar approaches.

Yep! The only thing that remains is to focus on that code being properly functional; i.e. avoiding side-effects. Side-effects and parallelism don't mix well. Wonder if this will give rise to more functional languages.
There will still be cases where more fine tuned control is warranted. Rust has done this very intelligently by moving data race controls to the compiler level.
How about some HDL semantics with implicit pipelining...

Every statement in a HDL language runs in parallel but you can still write implicitly sequential code in VHDL processes.

Is the difficulty reasoning about threads a bit more specific than that? I think it is reasoning about threads with shared mutable state.
>"Humans are bad at reasoning about multiple threads simultaneously"

Humans are bad at reasoning about way too many things. I think mostly because many are lazy and do not want to learn. The ones who do have little problems. I do not find thread management particularly hard for the most parts (there are some exceptions but those are very uncommon).

I love when people want to brag so much that they basically end up claiming to have transcended the human condition.
Fine then. Your compiler is bad at reasoning about multiple threads simultaneously.
So how my compiler reasons about multiple threads?
> Reads and writes do not always happen in the order that you have written them in your code, and this can lead to very confusing problems. In many multi-threaded algorithms, a thread writes some data and then writes to a flag that tells other threads that the data is ready. This is known as a write-release. If the writes are reordered, other threads may see that the flag is set before they can see the written data.

> Reordering of reads and writes can be done both by the compiler and by the processor. Compilers and processors have done this reordering for years, but on single-processor machines it was less of an issue.

https://learn.microsoft.com/en-us/windows/win32/dxtecharts/l...

>"In many multi-threaded algorithms, a thread writes some data and then writes to a flag that tells other threads that the data is ready. This is known as a write-release. If the writes are reordered, other threads may see that the flag is set before they can see the written data."

This is why we have things like WaitForSingleObject and many other that deal properly with the chance of reordering and other concurrency related issues. All is fine with the reasoning on CPU, OS, Compiler and my own level. One just have to understand what is going on and know the tools. Those who are setting boolean flag to indicate the data is ready should not be programming for modern CPU's and have a basic knowledge first.

When the brightest minds in computer science, who've spent literal man-centuries developing the theory behind some of the various multiprocessing frameworks currently used, tell you multithreading is hard, I tend to go with those over J Random Hacker News Poster claiming it's easy-peasy and everyone else is lazy.
This reads like satire.
Shades of clojure's transducers