Hacker News new | ask | show | jobs
by mjfl 972 days ago
Very little innovation in programming languages has happened regarding new realities at the hardware level especially transition from serial to parallel execution.
5 comments

STM is about hardware, not programming languages, but is decently recent so maybe someone will actually try it in hardware and see if it provides a good benefit for the increased complexity of the chip.

Region based memory management was first conceived in 1967 and is achievable by any programming language that lets you manage memory yourself.

Mutable value semantics in native code have been available since at least 1980 with Ada.

Lifetimes in Cyclone seem the best example of PL research in the last 50 years you have there, as it’s only 20 years old.

Overall, I’m still unsure if this list proves the point that there is active useful research in the PL space or if it proves that there’s very little in the PL space to research. More research is probably required.

> STM is about hardware, not programming languages

As others have pointed out, there's STM research in PL, it's not entirely about hardware. (The link I gave wasn't great, sorry.)

> Mutable value semantics in native code have been available since at least 1980 with Ada.

Could you link to the relevant docs? I wasn't aware Ada had anything like this.

Is this implemented under the hood with deep copying? Because if so, that would explain why it hasn't started to catch on anywhere until now. Swift and Hylo have much more efficient implementations that "copy all the time".

https://www.jot.fm/issues/issue_2022_02/article2.pdf

> Region based memory management was first conceived in 1967

There's active research in this general space. I met someone who was working in it on a train, though I forget the details.

> and is achievable by any programming language that lets you manage memory yourself

Sure. I mean Rust lifetime discipline is "achievable" in C too, so long as you're very very careful.

> Lifetimes in Cyclone seem the best example of PL research in the last 50 years you have there, as it’s only 20 years old.

It typically takes 10+ years for PL research to go from papers to research languages to being incorporated into"real" languages.

> More research is probably required.

Always.

> STM is about hardware, not programming languages

STM is not about hardware, it's literally "software transactional memory" and is meant to be implemented in software without hardware support (beyond a CAS instruction or a similar set of instructions, perhaps). As a software component it could be part of libraries or part of a programming language as part of that language's general concurrency model.

The paper that OP linked to is indeed about hardware transactional memory, but STM is a variation of transactional memory that is implemented entirely in software.
Rust compiler enables trivial parallelism by enforcing multiple readers ^ writer; and it’s beautiful.

See Rayon.

I wouldn't say it's trivial but yes it's there and it's very helpful.
There are many cases where you can replace a call to `.iter()` in your Rust code to a call to `.par_iter()` from `rayon`. Those cases are trivial, and it's great.
That's a very nice feature but I don't know that it belongs in the category of new ideas from programming language theory. Fortran 90 had automatically parallelized language constructs and OpenMP added them to C and C++ as compiler intrinsics in 2002.
Yes, that one I like a lot.
Except for all the research around functional programming?

This is like when I hear people claim that physics has not advanced in the last 50-70 years.

Futhark (https://futhark-lang.org) is an example of a functional language specifically designed for writing "normal" high-level functional code that compiles to parallel execution.
Genuine question-- is there something about Futhark that makes it particularly well-suited for parallel execution compared to any other functional programming language (especially of the purely functional kind)? FP in general is inherently well suited for this application.

As I understand it, Futhark aims to leverage GPUs in particular, and that approach seems to be what makes it unique within the category of FP languages?

Functional languages have a core that is well suited for parallelism, but all mainstream functional languages have various features that are unsuitable for parallel execution - particularly if you want to target limited machines like GPUs.

The Futhark programming model is standard functional combinators: map, reduce, scan, etc. You can do that in any Functional language, and it is mostly trivial to rewrite a Futhark program in Haskell or SML or OCaml. Futhark removes a lot of things (like recursive data structures) to make efficient parallel execution easier (and even possible), and adds various minor conveniences. But you don't really need to add something for functional programming to be suitable for parallelism; you just have to avoid certain common things (like linked lists or laziness).

Functional programming is not based off how hardware is implemented. Serial execution of instructions and mutating chunks of memory at a time are all core parts of how the hardware works which aren't functional. Doing graph reduction and making tons of copies will be slow.
Are we still doing this stupid ass reasoning around FP? Is the CPU really that serial, when it literally reschedules your instructions based on the graph of their interconnections?

Also, just think about all the optimizations your "serial" programming language does -- are your yourself really write all those mutations? Or is that the compiler, that in many cases can do a much better job? Now what if the language's semantics allowed even more freedom for the compiler in exchange for more restrictions on the language? Sure, we still don't have "sufficiently advanced compilers" that would replace programmers, but FP languages absolutely trade blows with most imperative languages in many different kinds of problems. Very visibly when parallelism comes to the picture, as it turns out, a smart parallel-aware data structure will easily beat out their serial counterparts here.

> "Is the CPU really that serial, when it literally reschedules your instructions based on the graph of their interconnections?"

Yes. Yes, it is because all of that rescheduling and reordering is completely hidden at great effort and expense to make it seem like the instruction stream is executing in exactly in the order specified. If it weren't, lines of code would essentially execute in an indeterminate order and no programs would function.

Hardware was for a very long time a limiting factor in the practicality of FP. For most general applications, today this is a relative non-issue.

FP is also particularly well suited for cloud computing and parallel computation.

"FP" camps tend to come in two flavors: "I'm a mathematician writing a computer program, and all problems will be made to look like math problems even if it means the program becomes an inscrutable mess of types and dense syntax" and "functional-ish idioms are included". The latter is useful, sometimes, for cloud computing and parallel computation; the former tends to have too many problems (slow build, slow execution, poor jargon laden syntax, etc.) to be very useful outside of academia.

I am (perhaps obviously) biased, here, but I tend to just roll my eyes whenever any of my colleagues suggests we should use functional programming to solve a problem. There are actually very few real-world use cases where it's objectively better.

> all problems will be made to look like math problems

All problems that can be solved with code are math problems. Proofs and programs are isomorphic (see the Curry-Howard correspondence).

---

Edit: this is a factually accurate comment, delivered dispassionately. It's not controversial or new-- it's something we've known for longer than the C language has existed. Why the downvote? Like I said, see this: https://en.wikipedia.org/wiki/Curry%E2%80%93Howard_correspon...

Factually accurate (to some degree), but almost completely irrelevant.

First, not all problems that can be solved with code are math problems. Take driving a serial port, for instance. There might be some aspects of it that are mathematical, but mostly it's a matter of interfacing with the chip spec.

Second, even for problems that are isomorphic to a math problem... the thing about isomorphisms is that they aren't identical. One form or the other is often easier to deal with. (That's one of the reasons we care about isomorphisms - we can turn hard problems into easier ones.)

Well, which is easier, writing programs or doing proofs? Almost always, writing programs is easier. This is why it's (almost) completely irrelevant.

Nobody wants to write code that way. So nobody cares. They're not going to care, either, no matter how forcefully you point out Curry-Howard.

Now, as you say elsewhere, you can gain insights from math that can change your code. That's true. But I suspect that most of the time, you don't actually write the code as a proof.

This is like saying all physical engineering problems are quantum mechanics (or insert other physics theory) problems. It’s technically correct (the best kind of correct), but misleading and useless as a matter of practice.
Interesting. Functional programming is the only strategy I've seen be successful at building business software. (Hint: you can do FP in Java and it's not even awkward, and SQL is inherently functional).
I work at a FAANG-like company. The code base has almost no functional programming paradigms deployed. It’s a multi-billion dollar company from which I, an IC cog in the machine employee, became a multimillionaire through from the IPO. It’s wildly successful.
Is “wanting evidence for constant silver bullet claims that never actually pan out” really “obviously biased”?

It’s not just you. Functional Programming really does not adequately solve any of the problems that its advocates claim while refusing to provide any evidence.

And it’s not “biased” to write these claims off.

I don't know what "silver bullet" claims you've heard, but I'm not sure how that is relevant to this thread. I don't think I've made any outlandish claims, or any claims that aren't substantiated by a preponderance of academic literature.
Then why has it been slowly incorporated into literally every mainstream PL? You do know you don't have to be an extremist, and your code can contain both FP, OOP and imperative parts, whichever makes the most sense? FP and OOP are not even opposites.

They absolutely solve real issues, and it's just sticking your head into sand to say otherwise.

The parent comment was talking about aligning programming languages with the hardware. I am not commenting about the viablity of those languages, but rather that if your goal is to write the most performant code by understanding the strengths and weaknesses of the hardware than using fp concepts is not the way to do it.
I feel like in both of your comments you've changed the topic slightly. I responded to the following comment, which I interpreted literally:

> Very little innovation in programming languages has happened regarding new realities at the hardware level especially transition from serial to parallel execution

Well... if code were pure (in the FP sense), then a "sufficiently smart compiler" could move it around to extract the maximum performance.

But, as always, the sufficiently smart compiler never shows up. So we're left with the humans doing the tuning, and as you say, FP is kind of antithetical to that approach.

But how will language performance evolve as the nature of the hardware our programs run on evolves? IMHO this is not an easy question to answer right now.

C compilers don’t produce 100% optimal assembly language in all cases, but typically the assumptions they make are light. The executable code they output is somewhat predictable and often close enough to hand-optimised assembly in efficiency that we ignore the difference. But this whole approach to programming was originally designed for single-threaded execution on a CPU with a chunk of RAM for storage.

What happens if we never find a way to get a single core to run much faster but processors come with ever more cores and introduce other new features for parallel execution? What happens if we evolve towards ever more distributed systems, but farming out big jobs to a set of specialised components in the cloud at a much lower level than we do today? What happens if systems start coming with other kinds of memory that have different characteristics to RAM as standard, from content-addressable memory we already have today to who-knows-what as quantum technology evolves?

If we change the rules then maybe a different style of programming will end up being more efficient. It’s true that today’s functional programming languages that control mutation and other side effects usually don’t compile down to machine code as efficiently as a well-written C program can. The heavier runtimes to manage responsibilities like garbage collection and the reliance on purely functional data structures that we don’t yet know how to convert to efficient imperative code under the hood are bottlenecks. But on the other hand, those languages can make much stronger assumptions than a lower-level language like C in other ways, and maybe those assumptions will allow compilers to safely allocate different behaviour to new hardware in ways that weren’t possible before, and maybe dividing a big job into 500 quantum foobar jobs that each run half as fast as a single-threaded foobaz job still ends up doing the job 200x faster overall.

You're mixing up functional programming with an execution model for functional languages. These are not the same.
Research around functional languages involves their execution models. Even ignoring execution models immutability is a staple of functional programming which is not good for performance.
For a field in which a large fraction (if not a majority) of the people in industry have (nominally at least) science degrees, CS research takes a fairly long time to penetrate into industry. Rust 1.0 had few, if any, features that weren't demonstrated in academia 30 years earlier.
It is a lot easier to write a paper demonstrating some feature than to write a production-quality ecosystem based on that feature. There isn't much either the academic side nor the engineering side can do about that.

And it's not like every academic idea that worked in a paper has worked as well as hoped when someone tried to turn it into a production-quality ecosystem.

The central core of Rust is mutability NAND sharing, enforced by a borrow checker. Was that really established 30 years prior? I thought that came from the Cyclone research language (v1 release 2006) which was only a few years prior to the initial stages of Rust (late 2000s, depending on how you count it)
Does anyone have any good counterpoints to this? From my, naive, perspective, this seems to be relatively true.

I've always assumed that, by now, I would be able to write code, in a semi mainstream language, and it would be made somewhat parallel, by the compiler. No need for threads, or me thinking of it.

There's projects like https://polly.llvm.org, but I guess I assumed there would be more progress through the decades.

Proving legality of transformations in the compiler is frequently impossible. Consequently, the main mode of implementation has been to essentially think of the problems in terms of the user saying that this loop is parallel, please make it run in parallel. OpenMP or Rust's rayon crate, for example. The other similar innovation has been programming SIMD as if each lane were an independent thread, which is essentially the model of ispc or CUDA (or #pragma omp simd, natch).

The other big impossible task is that most code isn't written to be able to take advantage of theoretical autoparallelization--you really want data to be in struct-of-arrays format, but most code tends to be written in array-of-struct format. This means that vectorization cost model (even if proven, whether by user assertion or sufficiently smart compiler, legal) sees it needs to do a lot of gathers and scatters and gives up on a viable path to vectorization really quickly.

Maybe some history will help here too. In the 90s, the data model of most programming languages wasnt even array-of-structs, but array-of-pointers to otehr pointers to other pointers...

And the majority of software we've inherited is written this way.

In the 90s this didnt matter, since dereferencing a pointer was comparably expensive to arithmetical operations. But with modern CPUs with massive caches and more native parallelisation, the difference is dramatic.

So, even now, the majority of languages we're using; and almost all code we've inherited today, are as far away as you can get from efficiently using modern CPUs.

The task is first to change all these languages to enable ergonomic programming without tons of indirection -- we're very far away from even providing basic tools for performant code

It's not totally clear what you're looking for.

As you noted, polyhedral compilers work on a pretty restricted subset of programs, but are fairly impressive in what they do. There has been research on distributed code generation [1] as well as GPUs [2]. While there has been work on generalizing the model [3], I think the amount of parallelization that a compiler can do is still very limited by its ability to analyze the code (which is to say, highly restricted).

Then you've got a large class of data-parallel-ish constructs like Rayon [4] as well as executors which may work their way into the C++ standard at some point [5]. How much safety these provide depends greatly on the underlying language. Generally speaking, the constructs here are usually pretty restricted (think parallel map), but often you can write more-or-less arbitrary code inside, which is often not the case in the polyhedral compilers.

If you don't care so much about safety and just want access to every parallel programming construct under the sun, Chapel [6] may be interesting to you. There is no attempt here, as best I can tell, to offer any sort of safety guarantees, but maybe that's fine.

On the other end of the spectrum you have languages like Pony [7] that do very much care about safety, but (I assume, haven't looked deeply) this comes with tradeoffs in expressiveness.

(I work in this area too [8].)

Overall, there are some very stringent tradeoffs involved in parallelizing code and while it certainly has been and continues to be a very active area of research, there's only so much you can do to tackle fundamentally intractable analysis problems that pop up in the area of program parallelization.

[1]: https://www.csa.iisc.ac.in/~udayb/publications/uday-sc13.pdf

[2]: https://arxiv.org/pdf/1804.10694.pdf

[3]: https://inria.hal.science/file/index/docid/551087/filename/B...

[4]: https://docs.rs/rayon/latest/rayon/

[5]: https://github.com/NVIDIA/stdexec (disclaimer: I did a quick Google search on this, not 100% sure this is the best link)

[6]: https://chapel-lang.org/

[7]: https://www.ponylang.io/

[8]: https://regent-lang.org/

There is also HVM [9], which can run sequentially written code parallel for some degrees. (It can run parallel some sequential Haskell code naively transpiled to HVM that GHC doesn't parallelize.)

[9] : https://github.com/HigherOrderCO/HVM

Languages like Erlang (or Elixir) that naturally split programs into isolated processes with local state and that communicate via message passing map well onto multi core systems. No need to have the compiler figure that out for you - instead it is expressed directly in the code.
One of the prerequisites to this is for the mainstream to stop doing things which are incompatible with concurrency.

Mutation (and other effects) makes the order of computations important. If you're writing to and reading from variables, the compiler is not free to move those operations around, or schedule them simultaneously.

And you probably don't want to be rid of all mutation. So what if you separated the mutating from the non-mutating? Well you'd need a sufficiently powerful type system. Likely one without nulls - as they can punch a hole through any type checking.

If you want this stuff in the mainstream, you at least have to get all the nulls and mutation out of the mainstream, which I don't think will happen.

The industry for the most part heeded "goto considered harmful" (1968), but hasn't done so with "the null reference is my billion dollar mistake (2009)". Maybe we just have to wait.

Especially when doing things like mapping or list comprehension. I'd love to be able to do operations on collections in parallel by simply marking my functions as pure. Just a simple xs.map {x -> f(x)} combined with "f" marked as pure confirmed by the compiler to make the magic happen.
It isn't really. There have been plenty of innovations in parallel computing:

* Go, with goroutines and heavy use of channels.

* Rust which is free of data races and generally improves the safety of multithreaded programming via Sync, Send and just safer APIs (e.g. Mutex).

* Chapel, which is a language designed primarily for multithreaded and multiprocess computing.

Those are just the ones I know about. There's obviously way more.

Take a look at Halide, which can autovectorize and multi-thread graphics computations (but does require a restricted language).
I was thinking outside of "embarrassingly parallel" [1] type work. :) But, that is fair.

[1] https://en.wikipedia.org/wiki/Embarrassingly_parallel#:~:tex....

Array languages, such as APL and others, tend to be easily parallisable given primitives tend to focus on intent and how to transform data rather than imperative operations.

Some of the SIMD operations feel very reminicent of APL primitives

So your best bet for a modern language is one from 1966?
Just because something's old doesn't mean it's not relevant, yes might not've been the best language to suggest - the more modern BQN, Uiua and Singeli exist too - but it's still a fairly niche paradigm. Ideas tend to come in cycles too - look at the 1980s ideas of the transputer or connection machine.

I wanted to point towards a programming paradigm that's approach enables you to take advantage of the parallel execution possible within chips today due to the notation being both precise in intent yet vague in execution. Take summing an array (`+/vector`) or selecting values given a boolean mask (`mask/values`) - both these very simple expressions are expressible directly in SIMD instructions, as there's no for loop index enforcing an order.

I feel this comment is pequant and jejuon. Of the distributed nature of computing has made it's way into programming language research.
Piquant and jejune?