Imagine a bit level systolic array. Just a sea of LUTs, with latches to allow the magic of graph coloring to remove all timing concerns by clocking everything in 2 phases.
GPUs still treat memory as separate from compute, they just have wider bottlenecks than CPUs.
GPUs still treat memory as separate from compute, they just have wider bottlenecks than CPUs.