Hacker News new | ask | show | jobs
by victorcase 3494 days ago
I will search about Tilera :) Your answer open my mind cause I hadn't thought about hardware. It got me thinking.. Are GPU architecture more friendly to be immutable or It have the same "problem" of shared memory too? And.. by the way thanks to sharing your knowledge :)
3 comments

If anything, GPU architectures are even less suited to immutability than CPUs. To get good performance out of a GPU (otherwise, why use them?), you have to avoid thread divergence caused by data-dependent branches as much as possible (I'm simplifying a bit). The simplest and most effective way to do this is to allocate a couple of large arrays and work on them. Immutable data structures, garbage collection, and frequent allocation of small objects, in contrast, will cause a lot of thread divergence.

In addition, immutable data structures tend to be pointer-based. For GPUs (even more so than for CPUs), you want contiguous memory accesses for acceptable performance.

GPUs deal primarily with "embarrassingly parallel" worksets – those worksets where the data can be trivially divided into unrelated chunks, so mutability is not really a concern (since working memory isn't shared). In fact it's typical for an algorithm not to modify its input, and to write its output to a separate buffer. All the synchronization happens at the start and end of the algorithm, when work is passed to/from the CPU.

While it is possible for threads in a (modern) GPU to modify shared memory, it's typically very costly. Usually it takes the form of one thread running at the exclusion of all others, to do something like collecting the results from other threads after they're done running.

Re: Tilera, looks like they got bought out by Mellanox and haven't progressed past where I last kept track of them: http://www.mellanox.com/page/products_dyn?product_family=238... It was a shame, because the IPC mesh was easily twice as fast as the shared memory solutions they were pushing in the Gx series.

Functional array programming on GPUs works pretty well, and is an immutable programming style. Your object granularity tends to be pretty big (entire arrays), so the low-level shared memory concerns that inevitably arise at some point in the software stack are easy to handle for the compiler and/or runtime. Basically, you have much fewer synchronisation points, so you can afford for them to be slow.

It's a more restricted programming style than common functional programming - in particular, recursion and advanced control flow is out - but that's the name of the game on GPUs, no matter what you do.

And there are at least Haskell and F# implementations for GPGPU, not sure about other languages.