Ethereum has different client software, they must all use the same consensus rules: produce/gossip/accept transactions and blocks that follow a certain format and respect certain invariants. When that specification is updated, a hard-fork occurs. It's not a problem if all clients meet the new spec perfectly but sometimes bug happens and different clients disagree on what constitutes the canon chain. This is what happened here: the OpenEthereum client (11% of the network) has had a bug following the Berlin hard-fork (network upgrade). Nodes runnning this client are stalling at a certain block because they don't recognize the new "post-Berlin" blocks as legitimate because of a bug in the implementation.
So are procotol-level changes announced ahead of time with a precise date when they get into force and the clients hard code that date as a transition? Or perhaps a block number is used?
It’s “just” a client issue, and OpenEthereum is “only” used by ~11% of the nodes. I believe the former name is Parity, which had a major bug related to frozen coins some years ago.
Theoretically there’s a protocol standard, and every client implementation is equal. Functionally, whatever Geth does is the actual standard because a supermajority of nodes run Geth.
This is not true, and I can say that because I work on Geth. We take great care to ensure that all clients behave according to the specification, and Geth has had similar faults in the past.
Surely it doesn't matter what the specification says? If the majority of the hashpower is using an implementation which deviates from the spec, then the blockchain will follow that code and not the 'correct' version. By the time the devs have fixed the bugs, it will be too late and too costly to roll back the chain and reverse all the subsequent transactions.
In theory you're right, yes. If the majority of the hashpower runs a specific version with specific consensus rules, then that version of the chain would "win".
But in reality, developers and others write the specification, which gets implemented in the clients and when a new version is available, the miners usually upgrade to the new version without any qualms what so ever. So in practice, the specification is what controls the network, as developers writing the clients implement things from the specification.
I mean in terms of bugs, rather than a 'rogue' client, i.e. not a deliberate attempt to cause a chain divergence.
If a 'broken' transaction gets accepted by a buggy client, and that buggy client has a majority of hash power, then that transaction is, by definition, not actually broken at all and everyone will be forced to accept it (because it will take too long to write a bug fix and re-write the blockchain history)
Reading the comments seems more like a technical problem on the consensus protocol of some implementation of eth nodes (in particular openethereum and geth)