|
|
|
|
|
by tmurray
4980 days ago
|
|
(full disclosure: used to work for NV on CUDA and did very extensive work on Titan, so I am probably biased) If you think your existing MPI app is going to automatically scale to a heterogeneous architecture (high-power x86 on the main CPU, Xeon Phi cores on the accelerator) and get acceptable performance, sorry, it's not going to happen. The fundamental constraints on 2012/2013 Xeon Phi performance that determine how apps should be written are exactly the same as current desktop GPUs (small, high-latency local memory that is not coherent with the rest of the system; relatively slow, high-latency link to CPU; ugly interactions with network cards in most environments; fundamental need to hide memory latency at all times). For any sort of performance beyond a standard Xeon, you're going to want to run a Xeon Phi as a targeted accelerator rather than offloading entire processes to it and using a standard MPI stack. This means you're going to be running in a hybrid host/device mode and using compiler directives or a specific parallel language and API to deal with on-chip execution and data transfer, which puts you in exactly the same solution space as with GPUs. in other words: the Phi of today is not a panacea. you get better tools and more flexibility in terms of the programming model, but the fast path that any of its intended market would use in applications looks identical to GPUs. |
|