|
|
|
|
|
by JonChesterfield
815 days ago
|
|
I don't think the instruction encoding is a significant problem. Cache coherency really might be. A current x64 chip is a dozen or so separate dies with eight or so x64 cores per die, with a couple of those in different sockets. When one thread on one code decides to write to a cache line, the memory model makes really strong guarantees about cores on some other socket noticing that change. Arm doesn't have to go with total store order. GPUs involve distinct blocks of memory with their own invariants on when caches are invalidated at potentially very coarse granularity (like no change will be seen until after a kernel has finished executing, where a kernel is essentially a process that sprung to life and then did arbitrary amounts of maths). Fast x64 code is prone to carefully partitioning the problem across different cores and trying not to hit a cache from another core but even then you still have something like MOESI sitting in the background waiting just in case some thread mutates the instructions executing on another one. |
|