Hacker News new | ask | show | jobs
by vardump 2406 days ago
> Can you come up with an example where a oil or finance sector developer needs to understand MESIF?

Yes. When they're writing high performance multithreaded analysis software and they're deciding which cache lines to write to and which only to read from. Those lines you only read from can be in S (shared) or F (forward) state.

And why is this important? Performance characteristics of a line entering in M (modified) state are pretty bad, if the line is shared between multiple cores.

Perhaps you'll also want to do all reads at once, knowing the line will more likely remain in F state for short periods, instead of bouncing S/F line state between CPU cores?

NUMA comes also to play, making this mistake even more costly. You really want to keep inter-core (especially NUMA socket!) communication to the minimum.

Of course, you could say you don't strictly need to understand MESI (or MESIF), but it really helps understanding why you do things certain way and reasoning about multi-core performance. The thing is, you can say same "you don't need to know this" about a lot of "low level details" in software trade.

Just like you need to understand DOM as a front-end developer to minimize DOM changes, even if you don't access it directly.

In cache coherency case it's analogously about reducing unnecessary multicast and broadcast messages sent between cores.

1 comments

Thank you for writing a scenario and how it relates to coherency mechanisms.

So now we get to thinking about whether this gives the developer an advantage over just knowing "Dirtying cache lines across different cores/threads is slow". I don't think I would conclude so here.

But yeah I like reading details about microarchitectural details and other computer architecture topics, and am symphatetic to the point of view that knowing the "why" is nice. Just like I find it interesting to read about how DOM APIs are implemented in browsers and why they are hard to make faster...

> So now we get to thinking about whether this gives the developer an advantage over just knowing "Dirtying cache lines across different cores/threads is slow". I don't think I would conclude so here.

The hidden thing behind all this is that even if the data is just read-shared, it can still generate traffic between cores and sockets.

Since these communication links are a shared resource [0], doing things wrong hurts performance in unrelated code and cores. Just because of storm of cache coherency packets is being sent between cores.

So yeah, you really do want to minimize this to maximize performance and scalability across the whole system!

[0]: In Intel's case, this shared resource is ring bus inside CPU socket and QPI between CPU sockets.

True, high validation traffic of read-shared data can indeed have effects in some cases, especially on multi socket systems, and it can sometimes be beneficial to have private copies of even read-only data for different threads if the data is small.