| HN Mirror

I think it boils down to various constraints. If you want high bandwidth low latency you need to be physically close on the chip. Presumably an existing chip is already optimized given those constraints and adding another component in means you need to trade something else off.

The other thing that I've seen which may or may not apply to the Intel case is that complexity in chip design can be managed more easily by having blocks that connect to standard interfaces. I.e. if you look inside the Xeon it probably looks like a bunch of different chips that were thrown onto the same die with some standard interconnects. Most of the optimization effort goes inside those blocks, e.g. inside a single core, and it's a lot more difficult to add an FPGA closer to the core vs. just throwing it somewhere else on the chip. That is the number of engineers in Intel who are intimately familiar with the innards of the x86 core design and are capable of making these sorts of changes is probably much much lower than the number who are capable of throwing in some external "block" onto the die and tie it into a standard bus.