|
|
|
|
|
by DeepDuh
4982 days ago
|
|
I've done my master thesis on GPGPU, so maybe I can help out a bit. I'm not yet too familiar with Epiphany's design however. From what I could grasp what sets them apart the most is a different memory architecture compared to multicore CPUs, where the individual cores seem to be optimized for accessing adjacent memory locations as well as the locations of the direct neighbors. This is one point where the architecture seems to be similar to GPUs, although GPUs have a very different memory architecture again - for the programmer it might look similar however, especially when using OpenCL. The main point where Epiphany is diverging from GPUs is that the individual cores are complete RISC environments. This could mainly be a big plus when it comes to branching and subprocedure calls (although NVIDIA is catching up on the later point with Kepler 2). On GPUs the kernel subprocedures currently all need to be inlined and branches mean that the cores that aren't executing the current branch are just sleeping - Epiphany cores seem to be more independent in that regard. I still expect an efficient programming model to be along the same lines as CUDA/OpenCL for epiphany however - which is a good thing btw., this model has been very successful in the high performance community and it's actually quite easy to understand - much easier than cache optimizing for CPU for example. If we compare epiphany to CPU what's mainly missing is the CPU's cache architecture, hyperthreading, long pipelines per core, SSE on each core, possibly out-of-order and intricate branch prediction (not sure on those last ones). The missing caches might be a bit of a problem. The memory bandwidth they specify seems pretty good to me, but from personal experience I'd add another 20-30% to the achievable bandwidth if you have a good cache (which GPU has since Fermi for example). The other simplifications I actually like a lot - to me it makes much more sense to have a massive parallel system where you can just specify everything as scalar instead of doing all the SSE and hyperthreading hoops like on CPUs - optimizing for CPU is quite a pain compared to those new models. |
|
In the meantime, leave the fast atomic ops, ECC, and full IEEE compliance to the GPUs and Xeon Phis of the world until you have the transistor budget to go after them...
All IMO of course...