|
|
|
|
|
by BruceIV
4735 days ago
|
|
I've actually only really just started, I'm less than a year into my PhD, so I don't have anything published yet. My current project involves doing some non-trivial calculations on a large matrix of multi-precision integers - the obvious way to lay out the data (an array of dynamically resizable vectors) is dead wrong for GPU (the memory bandwidth is completely swamped by fetching from a different pointer for each thread - I got 100x slowdown from the CPU reference implementation), so right now I'm redoing the experiments based on a vector of fixed-length arrays (the 0th elements of all the vectors from the first formulation go in one array, the 1st elements in the next, and so forth). |
|
I guess if you generalize this transformation sufficiently you end up doing whole-program flattening like NESL and DPH. Or is there a less intrusive way to pull that off?