| I'm not sure exactly what the grandparent comment meant, but I think I have an idea. I only skimmed the contents so take this with a grain of salt. Your book is focusing on a pretty narrow part of distributed computing. I would rename it "Managing State in Distributed Systems", or "Distributed Storage Systems". Your examples are Bigtable and Dynamo, which fall in this category. The book seems to be aimed at sort of a "beginning" audience. But the topics are inappropriate for a beginning audience, and skewed for an expert audience. Real distributed systems try to be stateless wherever possible. You need "big computer science" to manage state in distributed systems, but most code in a distributed system should not manage state. These techniques should be confined to specialized storage systems. Here are some examples of real world distributed systems that don't use the described techniques to manage state: - clusters of stateless web servers + single master database (99%+ of websites people use)
- message queue / work queue. A single machine can productively manage 1,000 - 10,000 stateless workers, depending on the workload.
- MapReduce
- Original GFS
- Napster
- BitTorrent (tracker and trackerless would be interesting to write about)
- BitCoin
The title seems to imply a practical bent, but it seems more like a collection of ideas (which are important and interesting, but not really what engineers need to know. IMO the #1 skill for distributed computing is to be competent at BOTH programming a single computer and at system administration).If I wanted to be harsh, I would say it looks like you read a bunch of stuff and didn't work with it or implement it? At the very least, the ideas don't seem to be put in the context of commonly deployed distributed systems. People need to understand these simpler, more robust, and more performant techniques, and how to apply them to their specific problem domain, rather than blindly throwing consensus at every problem (which is a disturbing trend I've seen). |
Another topic that's huge all by itself is peer-to-peer networks, and all their associated aspects, such as structured (DHTs like Chord, Cassandra, etc.) vs unstructured (Gnutella, Kazaa, etc.), P2P search, handling churn, handling peers with heterogenous capabilities, peer selection, topology organization, decentralized routing, file-sharing (torrents) vs streaming (PPLive, Spotify), etc.
Other topics (with several overlapping aspects) include:
- Security, such as Sybil attacks, group key management, etc;
- Overlay networks;
- CDNs;
- Ad hoc and mesh networks;
- MMOs and multiplayer games;
- SCADA and industrial control systems;
- Pub/Sub systems and application layer multicast;
- Distributed file systems;
- Load balancing and bandwidth management;
And that's just off the top of my head... I'm sure I'm missing other important topics.