Hacker News new | ask | show | jobs
by lifis 62 days ago
But why doesn't the CPU just lock two cachelines? Seems relatively easy to do in microcode, no? Just sort by physical address with a conditional swap and then run the "lock one cacheline algorithm" twice, no?

Perhaps the issue it that each core has a locked cacheline entry for each other core, but even then given the size of current CPUs doubling it shouldn't be that significant. And one could also add just a single extra entry and then have a global lock but that only locks the ability to lock a second cacheline.

3 comments

There is really no such a thing as cacheline locking per-se. As far as I understand, the coherence protocol guarantees that the cpu can hold a cacheline in the exclusive state for a certain set amount of cycles, which is enough to write the top element of the store buffer into it. Making sure that the two cachelines are available at the same time would add either significant complexity to the coherence protocol, which is already one of the most complex bits of the system and very hard to validate, or force a potentially unbounded retry/backoff loop with no guaranteed forward progress.
I suspect it's the risk of deadlocks and perhaps they have no easy way to avoid it.
ordering lock acquisition is a tested strategy to avoid deadlocks; so locking the cache lines sorted by PA would cover that?
I assume to save on resources, even if your algorithm is not much more taxxing on silicon, maybe the designers at intel and amd just didn't think optimizing split locks was worth it