| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by nomel 972 days ago

Does anyone have any good counterpoints to this? From my, naive, perspective, this seems to be relatively true.

I've always assumed that, by now, I would be able to write code, in a semi mainstream language, and it would be made somewhat parallel, by the compiler. No need for threads, or me thinking of it.

There's projects like https://polly.llvm.org, but I guess I assumed there would be more progress through the decades.

8 comments

jcranmer 972 days ago

Proving legality of transformations in the compiler is frequently impossible. Consequently, the main mode of implementation has been to essentially think of the problems in terms of the user saying that this loop is parallel, please make it run in parallel. OpenMP or Rust's rayon crate, for example. The other similar innovation has been programming SIMD as if each lane were an independent thread, which is essentially the model of ispc or CUDA (or #pragma omp simd, natch).

The other big impossible task is that most code isn't written to be able to take advantage of theoretical autoparallelization--you really want data to be in struct-of-arrays format, but most code tends to be written in array-of-struct format. This means that vectorization cost model (even if proven, whether by user assertion or sufficiently smart compiler, legal) sees it needs to do a lot of gathers and scatters and gives up on a viable path to vectorization really quickly.

link

mjburgess 972 days ago

Maybe some history will help here too. In the 90s, the data model of most programming languages wasnt even array-of-structs, but array-of-pointers to otehr pointers to other pointers...

And the majority of software we've inherited is written this way.

In the 90s this didnt matter, since dereferencing a pointer was comparably expensive to arithmetical operations. But with modern CPUs with massive caches and more native parallelisation, the difference is dramatic.

So, even now, the majority of languages we're using; and almost all code we've inherited today, are as far away as you can get from efficiently using modern CPUs.

The task is first to change all these languages to enable ergonomic programming without tons of indirection -- we're very far away from even providing basic tools for performant code

link

eslaught 972 days ago

It's not totally clear what you're looking for.

As you noted, polyhedral compilers work on a pretty restricted subset of programs, but are fairly impressive in what they do. There has been research on distributed code generation [1] as well as GPUs [2]. While there has been work on generalizing the model [3], I think the amount of parallelization that a compiler can do is still very limited by its ability to analyze the code (which is to say, highly restricted).

Then you've got a large class of data-parallel-ish constructs like Rayon [4] as well as executors which may work their way into the C++ standard at some point [5]. How much safety these provide depends greatly on the underlying language. Generally speaking, the constructs here are usually pretty restricted (think parallel map), but often you can write more-or-less arbitrary code inside, which is often not the case in the polyhedral compilers.

If you don't care so much about safety and just want access to every parallel programming construct under the sun, Chapel [6] may be interesting to you. There is no attempt here, as best I can tell, to offer any sort of safety guarantees, but maybe that's fine.

On the other end of the spectrum you have languages like Pony [7] that do very much care about safety, but (I assume, haven't looked deeply) this comes with tradeoffs in expressiveness.

(I work in this area too [8].)

Overall, there are some very stringent tradeoffs involved in parallelizing code and while it certainly has been and continues to be a very active area of research, there's only so much you can do to tackle fundamentally intractable analysis problems that pop up in the area of program parallelization.

[1]: https://www.csa.iisc.ac.in/~udayb/publications/uday-sc13.pdf

[2]: https://arxiv.org/pdf/1804.10694.pdf

[3]: https://inria.hal.science/file/index/docid/551087/filename/B...

[4]: https://docs.rs/rayon/latest/rayon/

[5]: https://github.com/NVIDIA/stdexec (disclaimer: I did a quick Google search on this, not 100% sure this is the best link)

[6]: https://chapel-lang.org/

[7]: https://www.ponylang.io/

[8]: https://regent-lang.org/

link

bmacho 972 days ago

There is also HVM [9], which can run sequentially written code parallel for some degrees. (It can run parallel some sequential Haskell code naively transpiled to HVM that GHC doesn't parallelize.)

[9] : https://github.com/HigherOrderCO/HVM

link

dugmartin 972 days ago

Languages like Erlang (or Elixir) that naturally split programs into isolated processes with local state and that communicate via message passing map well onto multi core systems. No need to have the compiler figure that out for you - instead it is expressed directly in the code.

link

mrkeen 972 days ago

One of the prerequisites to this is for the mainstream to stop doing things which are incompatible with concurrency.

Mutation (and other effects) makes the order of computations important. If you're writing to and reading from variables, the compiler is not free to move those operations around, or schedule them simultaneously.

And you probably don't want to be rid of all mutation. So what if you separated the mutating from the non-mutating? Well you'd need a sufficiently powerful type system. Likely one without nulls - as they can punch a hole through any type checking.

If you want this stuff in the mainstream, you at least have to get all the nulls and mutation out of the mainstream, which I don't think will happen.

The industry for the most part heeded "goto considered harmful" (1968), but hasn't done so with "the null reference is my billion dollar mistake (2009)". Maybe we just have to wait.

link

7734128 972 days ago

Especially when doing things like mapping or list comprehension. I'd love to be able to do operations on collections in parallel by simply marking my functions as pure. Just a simple xs.map {x -> f(x)} combined with "f" marked as pure confirmed by the compiler to make the magic happen.

link

IshKebab 972 days ago

It isn't really. There have been plenty of innovations in parallel computing:

* Go, with goroutines and heavy use of channels.

* Rust which is free of data races and generally improves the safety of multithreaded programming via Sync, Send and just safer APIs (e.g. Mutex).

* Chapel, which is a language designed primarily for multithreaded and multiprocess computing.

Those are just the ones I know about. There's obviously way more.

link

tylerhou 972 days ago

Take a look at Halide, which can autovectorize and multi-thread graphics computations (but does require a restricted language).

link

nomel 972 days ago

I was thinking outside of "embarrassingly parallel" [1] type work. :) But, that is fair.

[1] https://en.wikipedia.org/wiki/Embarrassingly_parallel#:~:tex....

link

bear8642 972 days ago

Array languages, such as APL and others, tend to be easily parallisable given primitives tend to focus on intent and how to transform data rather than imperative operations.

Some of the SIMD operations feel very reminicent of APL primitives

link

aidenn0 972 days ago

So your best bet for a modern language is one from 1966?

link

bear8642 970 days ago

Just because something's old doesn't mean it's not relevant, yes might not've been the best language to suggest - the more modern BQN, Uiua and Singeli exist too - but it's still a fairly niche paradigm. Ideas tend to come in cycles too - look at the 1980s ideas of the transputer or connection machine.

I wanted to point towards a programming paradigm that's approach enables you to take advantage of the parallel execution possible within chips today due to the notation being both precise in intent yet vague in execution. Take summing an array (`+/vector`) or selecting values given a boolean mask (`mask/values`) - both these very simple expressions are expressible directly in SIMD instructions, as there's no for loop index enforcing an order.

link