Hacker News new | ask | show | jobs
by lightcatcher 1811 days ago
Parallel prefix sum is the most underappreciated parallel algorithm in my opinion, and this paper is the best explanation and visualization of the concept I've seen.

A few years ago I worked on a deep learning project using parallel prefix sum as a new way to accelerate recurrent neural nets on GPUs[0]. The paper in this post was the most important reference and source of inspiration. I'm happy to see this paper shared on HN in hopes that it also sparks ideas in others.

[0] https://arxiv.org/abs/1709.04057

1 comments

I used to program on the connection machine. I tried to do some work recently with the intel vector instructions and was quite frustrated by the lack of scans. we used them for _everything_