Hacker News new | ask | show | jobs
by slededit 3613 days ago
You really do need these changes to get max parallelism though. Where it shines is situations where you'd otherwise be porting to a GPU. On the Phi its a recompile and adding a few intrinsics to your inner loops. This is much faster than getting reasonable performance on a heterogeneous architecture and you don't have to micro-manage the slow PCIe link between the CPU and the GPU.