From what I read, they can tell differences when having the randomness in the hundredths-of-seconds (specifically said they patched to go from 10s to 10,000µs). The randomness I mention, assuming N is the maximum number of tolerable system time for successful login, should be 2N + random(2N, ~100N). Or just store a time at login start and force hit the same deadline every time (via sleep of diff from start) then add randomness on top. Of course, additional brute force detection/protection is ideal for repeated failures.
The random sleep is to prevent obtaining enough samples and provide reasonable noise at this small sample size. Given enough samples until the end of time, patterns can be obtained. This is not breaking TLS here, this is login, and seconds of sleep vs microseconds makes a difference.
Randomness inserted like that can be filtered out statistically with a decent sample set - it's a gaussian distribution, so you'll still see the saying timing differences.
This is why things have to be constant time, rather than random return time.
The random sleep is to prevent obtaining enough samples and provide reasonable noise at this small sample size. Given enough samples until the end of time, patterns can be obtained. This is not breaking TLS here, this is login, and seconds of sleep vs microseconds makes a difference.