|
|
|
|
|
by enisberk
728 days ago
|
|
This is really cool work! Congrats on both the paper and the graduation! A long time ago, I worked on optimizing broadcast operations on GPUs [1]. Coming up with a strategy that promises high throughput across different array dimensionalities is quite challenging. I am looking forward to reading your work. [1]https://scholar.google.com/citations?view_op=view_citation&h... |
|
Thanks! Although I still have to actually graduate and the paper is in review, so maybe your congratulations are a bit premature! :)
> A long time ago, I worked on optimizing broadcast operations on GPUs [1].
Something similar happens in Futhark, actually. When something like `[1,2,3] + 4` is elaborated to `map (+) [1,2,3] (rep 4)`, the `rep` is eliminated by pushing the `4` into the `map`: `map (+4) [1,2,3]`. Futhark ultimately then compiles it to efficient CUDA/OpenCL/whatever.