Hacker News new | ask | show | jobs
by Aloisius 66 days ago
There does appear to be a bug, but it's not what the blog describes.

If tcp_now stops updating at <= 2^32 - 30000 milliseconds, then TSTMP_GEQ(tcp_now, timer) will always fail since timer is tcp_now + 30000 which won't wrap.

This does look like it is possible since calculate_tcp_clock() which updates tcp_now only runs when there's TCP traffic. So if at 49 days uptime you halted all TCP traffic and waited about a day, tcp_now would be stuck at the value before you halted TCP traffic.

In cases where tcp_now gets stuck at > 2^32 - 30000, it looks like TCP sockets in the TIME_WAIT will end up being closed immediately instead of waiting 30 seconds, which isn't great either.

2 comments

Are you sure?

tcp_now’s maximum cannot physically reach 2^32 because the trailing zeros of that number exceeds the bit width of data type.

Therefore, tcp_now + 30000 will wrap when tcp_now is equal to 2^32 - 3000. Your inequality sign should be strict <, otherwise the result does not follow.

Yes, you are correct. Bad editing on my part.

It should be that if tcp_now gets stuck before (<) (2^32 - 30000) ms from boot, it would cause deadline timers for reaping TCP_WAIT would always be greater than tcp_now because it wouldn't wrap. If stuck at or after (>=) (2^32 - 30000), it would cause them to potentially be reaped faster they should be.

Actually looking at the code a bit more, it looks like calculate_tcp_clock() is run at least once per hour even when there's no TCP traffic or sockets open, so getting into the state where it never reaps TIME_WAIT sockets which would be hard to predict if this would happen.

It also looks like if tcp_now gets stuck, other tcp timers may have problems as well.

yep that makes sense