This is not directly relevant, but Power ISA 3.1 (from May 2020) has 128 bit division operations (vdivuq and vdivsq) with a maximum latency of 61 cycles. I don't have access to a Power 10 machine to see how it compares to what's presented here, but I thought it was an interesting addition to the ISA.