Hacker News new | ask | show | jobs
by hershel 4382 days ago
YZF,why can't we start from an optimized FPGA - i.e. small memory blocks spread all around with massive bandwidth and low latency, and find a way to give decent enough access to the cpu to all that memory ?

And yes i know that the cpu will be the bottleneck, but it will be the bottleneck anyway.

1 comments

I think it boils down to various constraints. If you want high bandwidth low latency you need to be physically close on the chip. Presumably an existing chip is already optimized given those constraints and adding another component in means you need to trade something else off.

The other thing that I've seen which may or may not apply to the Intel case is that complexity in chip design can be managed more easily by having blocks that connect to standard interfaces. I.e. if you look inside the Xeon it probably looks like a bunch of different chips that were thrown onto the same die with some standard interconnects. Most of the optimization effort goes inside those blocks, e.g. inside a single core, and it's a lot more difficult to add an FPGA closer to the core vs. just throwing it somewhere else on the chip. That is the number of engineers in Intel who are intimately familiar with the innards of the x86 core design and are capable of making these sorts of changes is probably much much lower than the number who are capable of throwing in some external "block" onto the die and tie it into a standard bus.