Hacker News new | ask | show | jobs
by kevinnk 3290 days ago
> Do you see interposer style designs as linking up terabytes of DRAM?

I don't think we're going to see a terabyte of dram on an interposer for a while (4GB is about the max you can get commercially right now). I'm not sure what you're trying to get at though; even with logic in DRAM you have to go off chip to get to terabyte levels, so I don't see the advantage.

> All the chips you're talking about are pretty major dies, not really suitable for having many stacks of them in conventionally tightly spaced DIMM arrays to reach such RAM sizes.

The stacking happens in package (<1mm thick). Your DIMM array is going to have to be pretty damn tight for that to matter.

> Of course, 3d chip advances might throw all current assumptions out the window and change the layout of everything.

TSVs are 3D (or "2.5" depending on the configuration). You should have thrown out the assumptions back in 2014.

1 comments

> I'm not sure what you're trying to get at though; even with logic in DRAM you have to go off chip to get to terabyte levels, so I don't see the advantage.

Many-core processors with low-latency wide-bus on-chip random access speeds need to scale horizontally as well. Focusing on large chips means you're not going to have very many on a single motherboard, where QPI/HyperTransport/memory-bus style communication can achieve higher and more user-transparent shared memory access performance, compared to offboard communication networking.

The "stacks" I was talking about are just the rows of DIMM slots stacked together in tight proximity, compared to the number of CPUs/GPUs/etc per unit area on a multi-socket motherboard to achieve the same memory footprint. (easily apples and oranges in the current incarnations, admittedly, but focusing on end-user expandability and configuration options)

In my opinion, this type of on-chip fast-RAM model in larger memory systems would best take advantage of splitting up processing to where the memory is, as opposed to a fatter node model, especially when it comes to physical size and inter-chip communication of many chips.

However, if we soon have many-core chips with 32 parallel memory buses leading to in-package 256GB DRAM silicon, it does become more moot.

Yes, I know that 3d silicon stacking, HBM, etc exist now. While they've had some good speed & power advantages, they remain very limited in terms of memory footprint. And of course, the memory size is fixed per such a chip, and there doesn't seem to be a path for many-chip expansion solutions for anything but the top-end enterprise market. I think the Venray model has a simplicity and expandability that keeps the most advantageous tradeoffs.

Okay, I think we're talking past each other at this point so let me be as clear as possible: the original comparison was between TSVs and logic in DRAM. Both of these are a way to get DRAM on chip and as physically close to the core logic as possible. Logic in memory is on die, while TSVs are on package; neither can be "extended" by an end-user without connecting off chip. Neither changes the physical package size very much (TSVs are not intrinsically bigger than logic in DRAM). Both have nothing to do with off chip connections; as soon as you start talking about things happening off package they behave identically (grids of processor/DRAM combos can be done with either in exactly the same way). Any chip you like can have TSVs (many core, single core, big, small, whatever); there's​ no architecture that logic in DRAM can have that TSVs can't. Both can be used to "split up processing to where the memory is". Neither has to be a "fat node".

So with that out of the way, what exactly is the advantage of logic in memory? Because so far nothing you have described is actually an intrinsic advantage.

Right, it becomes less about individual chips' TSVs vs logic in DRAM, and more about the scalability of the architecture. In the marketplace right now, the trend for these TSV/interposer/multi-die sorts of devices is in "fat node" designs, instead of more on-board distributed designs.

Logic on DRAM should be simpler & cheaper, which would in the long tail lend itself to more horizontal scaling (and horizontal scaling is currently required to get large memory footprints economically). More elaborate & expensive designs would end up more in fat node designs. There's really no technical difference when looking at many-chip architectures as the chip package is a black box at that level, but it's more an economic one.