Hacker News new | ask | show | jobs
by awinograd 3667 days ago
I think it has to do with distance electricity has to travel. If a chip is physically bigger, it takes longer to move bits inside of it. Sort of similar to why you can't have an L1 cache the same size as main memory.

Just my best guess from a single processor architecture class so definitely not positive that's the answer.

1 comments

> I think it has to do with distance electricity has to travel.

Correct. The chip has to be small enough that the clock can propagate everywhere within the chip within a single cycle, or problems will occur.

> If a chip is physically bigger, it takes longer to move bits inside of it.

Yes, so either you would need to delay for some cycles to ensure that the information has propagated (which will basically nullify the performance gains from cranking up your clock) or clock parts of the chip differently, but you're pretty much always limited by the slowest part of the chip (which is why every modern chip has a cache, because otherwise it would stall waiting for data).

> The chip has to be small enough that the clock can propagate everywhere within the chip within a single cycle, or problems will occur.

Problems like what? These chips are already chopped up into different clock domains, and it's easy to install some PLLs so that perfectly-synchronized clock signals can blanket a chip even if it's inches across.

Moving data around is also not a big deal. The Xeons in the article already have multi-nanosecond ring busses running around between cores[1]. They don't slow the chip down because the design simply lets long-distance data transfers take multiple cycles. L3 and I/O don't have to be blazingly fast in terms of latency.

[1] http://images.anandtech.com/doci/8423/HaswellEP_DieConfig.pn...

> Problems like what?

The chip won't work.

> These chips are already chopped up into different clock domains, and it's easy to install some PLLs so that perfectly-synchronized clock signals can blanket a chip even if it's inches across.

Sorry, I didn't explain myself well enough. Of course chips have different clock domains, but these also come at a cost. The more synchronization you need to do between domains, the less die space you have for computationally useful stuff.

> L3 and I/O don't have to be blazingly fast in terms of latency.

I would argue differently, the impact of latency is highly dependent on the type of computation you're doing. If you're doing something with a lot of data (say, encoding video) then you need to be moving data as quickly as possible between the processor and memory. Any additional latency in cache or I/O will cause the performance to suffer.

Ideally, you want the latency of L3 and I/O to be as low as possible.

Ideally you want L3 to be fast, but it's going to be somewhat slow just by the nature of being large. And talking to ram is going to be slow even if the intra-chip pathways are infinitely fast. An extra few percent off-core latency isn't the end of the world if it lets you fit ten times as much computation on the die. L2 and L1 won't be affected.

And encoding video is nearly the platonic ideal of not caring about memory latency. You could easily make memory requests ten thousand cycles before you need the results. You just need throughput.