| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by zanny 4807 days ago

We should be leveraging the APU compute engine in opengl and opencl tasks to get more performance out of the system (maybe not gl, since it took a decade for ATI/AMD and Nvidia to properly support just having 2 of the exact same card work in parallel).

In practice, with the advent of compute shaders and opencl, there is very little intense serial work that can't be done in parallel. My workflow nowadays is (using python as an example)

Slow? (assuming we know why it is slow and it isn't just maligned algorithmic complexity making something a runtime exponential where you can use a quadratic) -> Put it in C++. Still slow? -> Parallelize into tasks and put in a work threadpool (assuming there is a lot of this kind of work happening, else just numcores / X threads do stuff). Still slow? -> Port over to opencl (or if memory copies are unnecessary, opengl compute shaders) and keep the old implementation for backwards compatibility with systems lacking them.