|
|
|
|
|
by derf_
931 days ago
|
|
Memory pressure is even worse on GPUs. I did some work to generalize Blelloch to 2D parallel prefix sums for integral image computation back in 2008 [1], and the number of memory accesses really dominates. On a GPU, for sufficiently small problems the number of passes matters more, and it is worth using a simpler, non-work-efficient algorithm to reduce setup overheads. [1] https://people.xiph.org/~tterribe/pubs/gpusurf.pdf Section III.A |
|