The clock drifts. Something needs to count those seconds. Even when the drift is small, phasing distortions become pretty obvious on lengthy recordings.
There's some interesting work going on in the AES to support synchronised audio over wide area networks, either through better recovery of PTP clocks distributed through WANs or using PTP with GNSS.
https://www.youtube.com/watch?v=tG7tCCKYDx4