|
|
|
|
|
by cuno
945 days ago
|
|
Funnily enough I published something similar as part of my PhD (2010). Essentially communication costs dominate over computation costs - an addition operation is practically free compared to the time and energy of moving data within a chip to the ALU to do the addition, let alone between chips. This is now the reverse situation to historical VLSI - whereby the computation was slow and expensive compared to practically "fast and free" on-chip communication. The implications extend far beyond just RAM. But yes, depending on the access patterns involved and the (physical) spatial arrangement of data, binary tree traversal (on 2D CMP or 2D cross-chip layout) is O(sqrt(N)) or O(log(N)sqrt(N)), and this result is not dependent on the system boundaries of memory hierarchies - it is true for dedicated on-chip scratchpad memories as it is for giant wafer-scale chips (such as Cerebras) as well. There are some special cases where it can be O(1) or O(log(N)) on average that I won't go into, but you can read more here if interested (sections 2.1 and 8.3): https://www.cl.cam.ac.uk/~swm11/research/greenfield.html A key way to think about things is that algorithms are not really running in some Platonic realm, each executed instruction occurs at some physical location and time, and data needs to move from one place and time to another place and time. To this end both physical wiring (or networks) are required to move the data spatially, and memory is used to move the data temporally. Together an algorithm's executed instructions are situated somewhere spatio-temporally and RAM (both on-chip and off-chip) serves as a temporal interconnect, but one that itself takes up physical space. |
|
It gets even more interesting (to me) when you consider semantic entanglement of data in non-platonic spaces. When you treat 'time to data access' as a fourth dimension and the state transition vector of an algorithm as the path one can show that Ft(O(path(n))) (Ft being the function that converts the complexity of a path into the time such a path takes to transit) for some arbitrary n is rather difficult to nail down. It can also pop out the result that the lowest complexity path(algorithm) is not faster than a high complexity path(algorithm) if the entangled elements(data) are in a slow space. Crazy I know but it is a pretty straight forward path from Amdahl's law to this sort of analysis.