Hacker News new | ask | show | jobs
by colanderman 3494 days ago
GPUs deal primarily with "embarrassingly parallel" worksets – those worksets where the data can be trivially divided into unrelated chunks, so mutability is not really a concern (since working memory isn't shared). In fact it's typical for an algorithm not to modify its input, and to write its output to a separate buffer. All the synchronization happens at the start and end of the algorithm, when work is passed to/from the CPU.

While it is possible for threads in a (modern) GPU to modify shared memory, it's typically very costly. Usually it takes the form of one thread running at the exclusion of all others, to do something like collecting the results from other threads after they're done running.

Re: Tilera, looks like they got bought out by Mellanox and haven't progressed past where I last kept track of them: http://www.mellanox.com/page/products_dyn?product_family=238... It was a shame, because the IPC mesh was easily twice as fast as the shared memory solutions they were pushing in the Gx series.