Hacker News new | ask | show | jobs
by trsohmers 3976 days ago
DRAM, or preferably a closer core. The memory on chip is all physically addressed, and part of a flat global address space. The first 128KB of the address space is core 0's memory, then the next 128KB is core 1's, and so on to core 255. When a core accesses a memory region not in its own local scratchpad, it hops along the network on chip (with one cycle per hop) to get to the core which has the needed memory address. The compiler would try to keep the needed data by a core in that cores local scratchpad, or if it can't, as close as possible. Even in the worse case scenario where a core needs to access the memory in the opposite corner (Core 0 accessing core 255), it is still only 32 cycles to access it (less than the ~40 cycles it takes to access L3 cache on an Intel chip).

The NoC is also entirely non blocking... a router is able to read/write to its cores scratchpad and do a passthrough in the same cycle.