These kinds of tricks are still used today. They're not so useful if you need a reciprocal or square root, since CPUs now have dedicated hardware for that, but it's different if you need a _cube_ root or x^(1/2.4).
The big cores do. They essentially pump division through something like an FMA (fused multiply-add) unit, possibly the same unit that is used for multiplication and addition. That's for the Newton-Raphson steps, or Goldschmidt steps.
In hardware it's much easier to do a LUT-based approximation for the initial estimate rather than the subtraction trick, though.
It's common for CPUs to give 6-8 accurate bits in the approximation. x86 gives 13 accurate bits. Back in 1975, the Cray 1 gave 30 (!) accurate bits in the first approximation, and it didn't even have a division instruction (everything about that machine was big and fast).