Fastly's MAC-based solution to this was actually one of the existing implementations we read about back when designing the original implementation of GLB in 2015/16, along with Facebook's IPVS-based solution. We loved the ideas behind Fastly's model, but didn't want to mess with Layer 2 to do it. GLB Director took some inspiration from both designs in the creation of L4 second chance and the L4/L7 split design.
Thanks for the details! I'd love to hear more about any data you have on the efficiency of a "second chance" design vs expanding to three or more failover servers. Very curious if a single alternate is enough to cover majority of incidents (xxx out of 1000 events?) Or how frequently you see failures that fall outside the two chance design decisions.