Hacker News new | ask | show | jobs
by white-flame 3290 days ago
> I'm not sure what you're trying to get at though; even with logic in DRAM you have to go off chip to get to terabyte levels, so I don't see the advantage.

Many-core processors with low-latency wide-bus on-chip random access speeds need to scale horizontally as well. Focusing on large chips means you're not going to have very many on a single motherboard, where QPI/HyperTransport/memory-bus style communication can achieve higher and more user-transparent shared memory access performance, compared to offboard communication networking.

The "stacks" I was talking about are just the rows of DIMM slots stacked together in tight proximity, compared to the number of CPUs/GPUs/etc per unit area on a multi-socket motherboard to achieve the same memory footprint. (easily apples and oranges in the current incarnations, admittedly, but focusing on end-user expandability and configuration options)

In my opinion, this type of on-chip fast-RAM model in larger memory systems would best take advantage of splitting up processing to where the memory is, as opposed to a fatter node model, especially when it comes to physical size and inter-chip communication of many chips.

However, if we soon have many-core chips with 32 parallel memory buses leading to in-package 256GB DRAM silicon, it does become more moot.

Yes, I know that 3d silicon stacking, HBM, etc exist now. While they've had some good speed & power advantages, they remain very limited in terms of memory footprint. And of course, the memory size is fixed per such a chip, and there doesn't seem to be a path for many-chip expansion solutions for anything but the top-end enterprise market. I think the Venray model has a simplicity and expandability that keeps the most advantageous tradeoffs.

1 comments

Okay, I think we're talking past each other at this point so let me be as clear as possible: the original comparison was between TSVs and logic in DRAM. Both of these are a way to get DRAM on chip and as physically close to the core logic as possible. Logic in memory is on die, while TSVs are on package; neither can be "extended" by an end-user without connecting off chip. Neither changes the physical package size very much (TSVs are not intrinsically bigger than logic in DRAM). Both have nothing to do with off chip connections; as soon as you start talking about things happening off package they behave identically (grids of processor/DRAM combos can be done with either in exactly the same way). Any chip you like can have TSVs (many core, single core, big, small, whatever); there's​ no architecture that logic in DRAM can have that TSVs can't. Both can be used to "split up processing to where the memory is". Neither has to be a "fat node".

So with that out of the way, what exactly is the advantage of logic in memory? Because so far nothing you have described is actually an intrinsic advantage.

Right, it becomes less about individual chips' TSVs vs logic in DRAM, and more about the scalability of the architecture. In the marketplace right now, the trend for these TSV/interposer/multi-die sorts of devices is in "fat node" designs, instead of more on-board distributed designs.

Logic on DRAM should be simpler & cheaper, which would in the long tail lend itself to more horizontal scaling (and horizontal scaling is currently required to get large memory footprints economically). More elaborate & expensive designs would end up more in fat node designs. There's really no technical difference when looking at many-chip architectures as the chip package is a black box at that level, but it's more an economic one.