| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by colanderman 3494 days ago

Computer memory is mutable. For multiple processors to make use of the same memory requires contending for that memory somehow (else the processors are not communicating in any meaningful way).

You can mathematically try to hide that however you want, but somewhere deep in your language runtime stack will be a shared-memory concurrent mutable algorithm, be it a lock, queue, or otherwise.

The alternative is to ditch shared memory entirely and use e.g. hardware IPC to manage contention. Tilera tried that, they had 64-core processors with a hardware IPC mesh and incoherent cache. But last I checked they switched focus to coherent shared-memory concurrency. Not sure what the driver for the change was but it's telling.

EDIT: Not to mention that half the article focused on promise-style concurrency, which has nothing to do with shared memory or even multicore. Immutability does nothing to address that I/O exists and has latency, and that people try to hide it in frankly bizarre ways. (Really… not pumping the event loop to avoid deadlock? That's a design issue, you should be using queues instead of mutexes and eliminate cycles if you find you have that problem.)

2 comments

eternalban 3494 days ago

> Computer memory is mutable. For multiple processors to make use of the same memory requires contending for that memory somehow (else the processors are not communicating in any meaningful way).

That's a bit of a strawman. For example, content addressable memory and message passing will certainly permit "processors ... communicating in [a] meaningful way".

link

colanderman 3494 days ago

That's why I called out message passing specifically in the very next paragraph.

link

victorcase 3494 days ago

I will search about Tilera :) Your answer open my mind cause I hadn't thought about hardware. It got me thinking.. Are GPU architecture more friendly to be immutable or It have the same "problem" of shared memory too? And.. by the way thanks to sharing your knowledge :)

link

adwn 3494 days ago

If anything, GPU architectures are even less suited to immutability than CPUs. To get good performance out of a GPU (otherwise, why use them?), you have to avoid thread divergence caused by data-dependent branches as much as possible (I'm simplifying a bit). The simplest and most effective way to do this is to allocate a couple of large arrays and work on them. Immutable data structures, garbage collection, and frequent allocation of small objects, in contrast, will cause a lot of thread divergence.

In addition, immutable data structures tend to be pointer-based. For GPUs (even more so than for CPUs), you want contiguous memory accesses for acceptable performance.

link

colanderman 3494 days ago

GPUs deal primarily with "embarrassingly parallel" worksets – those worksets where the data can be trivially divided into unrelated chunks, so mutability is not really a concern (since working memory isn't shared). In fact it's typical for an algorithm not to modify its input, and to write its output to a separate buffer. All the synchronization happens at the start and end of the algorithm, when work is passed to/from the CPU.

While it is possible for threads in a (modern) GPU to modify shared memory, it's typically very costly. Usually it takes the form of one thread running at the exclusion of all others, to do something like collecting the results from other threads after they're done running.

Re: Tilera, looks like they got bought out by Mellanox and haven't progressed past where I last kept track of them: http://www.mellanox.com/page/products_dyn?product_family=238... It was a shame, because the IPC mesh was easily twice as fast as the shared memory solutions they were pushing in the Gx series.

link

Athas 3494 days ago

Functional array programming on GPUs works pretty well, and is an immutable programming style. Your object granularity tends to be pretty big (entire arrays), so the low-level shared memory concerns that inevitably arise at some point in the software stack are easy to handle for the compiler and/or runtime. Basically, you have much fewer synchronisation points, so you can afford for them to be slow.

It's a more restricted programming style than common functional programming - in particular, recursion and advanced control flow is out - but that's the name of the game on GPUs, no matter what you do.

link

pjmlp 3494 days ago

And there are at least Haskell and F# implementations for GPGPU, not sure about other languages.

link