Hacker News new | ask | show | jobs
by nemothekid 2401 days ago
I've never heard of the MESI protocol before so that was really interesting to read, and I liked the comparison to distributed systems.

I'm wondering if the MESI protocol could be used in a networked database manner? I feel like you need master node though to coordinate everything though (like the L2 does in the example).

2 comments

The essential point of MESI is that it doesn't need a single master. In a sense, anyone who has exclusive ownership of a cache line is the "master" for that one cache line.

The downsides of MESI are that (a) it requires broadcasts, which don't scale very well; and (b) it doesn't tolerate partitions -- which also imposes an effective scaling limit, since large systems are always partitioned (usually with a partition of N-k and k partitions of 1, due to k nodes having failed).

> The downsides of MESI are that (a) it requires broadcasts

No, it can be implemented with directory instead, e.g.

https://en.wikipedia.org/wiki/Directory-based_cache_coherenc...

Or various combinations of snooping and directories ("snoop filters", or directories that act as "bridges" between broadcast domains, etc.).

In current Xeon processors (and presumably AMD EPYC as well, thought I don't yet have first-hand experience with those), you have a couple of directories per CPU with snoop filtering, as with tens of cores broadcasting becomes a scalability bottleneck. In the BIOS you can change the mode how it operates, with slightly different names and semantics depending on the CPU generation.

More than just broadcasts as we think of them in networked systems, it requires serialized/transactional broadcasts. Network distributed systems also have problems with ordering which is something made easier when you have a single bus with all transactions in sequence.
> I'm wondering if the MESI protocol could be used in a networked database manner?

The short answer is yes, it can, though you tend to use key ranges (or a "predicate range" for generality) rather than cache lines and addresses.

Taken to its extreme it can produce extremely fast and versatile distributed databases to the extent that different nodes are accessing non-overlapping key ranges, or only reading from shared key ranges.

Neither a master nor broadcasts are really needed. (Though avoiding masters is quite complicated to do right; and a little trickle of broadcasting is often used, just for nodes to discover each other, and check they are still up, but not for every cache transaction.)

The general MESI-like pattern is probably used more commonly for coherent network filesystems than databases, though.

Many network filesystems uses "leases" (or "oplocks") on files or ranges, which are similar to the states in MESI if the filesystem is one of the good ones that is coherent.

This bit of Documentation for SMB, the Microsoft network filesystem, might remind you of the S and M/E states in MESI:

  · Read-caching lease: allows caching reads and can be *shared* by multiple clients.
  · Write-caching lease: allows caching writes and is *exclusive* to only one client.