Hacker News new | ask | show | jobs
by mechagodzilla 3621 days ago
Did you stick with the parallel, SERDES-less interfaces for your interchip I/O? 48 GB/s implies a pretty high signalling rate to not have a CTLE, DFE, etc.

Why 3 interchip links? What network topology are you planning to use to scale to large numbers of chips? If you're still using parallel I/O, how are you planning to communicate beyond a single PCB?

What memory interface are you using? The article seems to confuse your interchip links with your memory controller.

1 comments

We have partnered with a startup (we'll announce who soon enough) who shared a lot of ideas about chip to chip I/O with myself. While they call it a SerDes, it is infact a source synchronous (clock forwarded) link that is 5 bits over 6 wires. It is silicon proven, and is capable of up to 125Gb/s over 12mm while being a little over 10x more energy efficient (in terms of pJ/bit) than other available VSR SerDes. Obviously it is short reach over PCB, but we imagine (yet to be tested) we can extend that reach a bit more using a more exotic PCB laminate (Megtron, Rogers, etc), or going over wire (tested to go over 6 inches using a HuberSuhner SMA cable). Right now, we are only using it to go between chips in a Multi Chip Module, or under 12mm on a PCB. Big bonus is as of a month ago, it is a JEDEC standard!

Most of the information in the linked article is very outdated (~16 months old), so we have decided to ditch the idea of having a separate DRAM and "External I/O" and just have our chip-to-chip on all four sides of the chip. The chip-to-chip interface uses the same protocol as our Network On Chip, and expands in the same 2D mesh. We are also looking into (with a sketched out plan) on how to directly interface this I/O with HBM dies that can be in the same MCM package. As far as supporting other memories/IOs, we are leaning towards having "adapter chips" that would convert our chip-to-chip interface to DDR4, Ethernet, Infiniband, etc.

As far as bandwidth numbers, our aggregate bandwidth for this test chip we have just taped out (16 cores + 2 chip-to-chip I/O macros on TSMC 28nm, 12mm^2 in size) is 60GB/s though for the planned production chip, we will be over 256GB/s. I have a good feeling we will be a fair margin higher than that, but I would rather under promise and over deliver.

25 gbps for a very short reach interconnect sounds possible, although having to go through an adapter chip is going to kill your latency from a system perspective. If you haven't already, you should check out the DE Shaw Research Anton 2 chip. It is an older process, but it has 66 4-way processor cores running at 1.65 Ghz and a roughly comparable network (although 6-way rather than 4-way), in addition to all of the md-specific hardware. It uses a similar memory hierarchy (although it does use non-coherent caches). Getting good performance out of software managed caches is very difficult in practice, even if you know your problem extremely well. With very carefully written software (and a sufficiently friendly problem) good performance is possible, but it definitely isn't easy.
Would it be possible to interface with HyperTransport or QPI? Can you name the JEDEC standard?
I highly doubt that a direct interface would be possible with either of them, though if you really wanted it, you could make an adapter (though fat chance Intel would open up QPI enough to allow for it). We haven't officially announced the partnership, though I can point you at JESD247.