Great article, always love to see the law of large numbers fail! You mention that the solution "requires all clients to cooperate" - wouldn't it be in the client's best interest to cooperate (maximizes the probability the request is served, as well as minimizes latency)? I suppose a malicious client could deliberately route traffic to a fully-utilized node, but that seems outside the scope of this problem space
Good question! Yes, it's in the interest of clients to cooperate.
Fundamentally, P2C thickens the interface between client and server by leaking the implementation and responsibility of balancing server load onto the client. Again, this works really well if there is a small, static set of clients as we have internally, but not that well if the clients are out in the wild (eg: on mobile devices). It's totally in the client's interest to upgrade, but that's always work and you just can't rely on that. :)
So imo this is a serious tradeoff that must be made before employing P2C. A good hybrid approach (briefly mentioned in the post) is to introduce a proxy that uses P2C. This is fully in your control and is cheaper than having the proxy maintain a full histogram of load on all downstreams.