|
|
|
|
|
by oasisbob
820 days ago
|
|
Feels like some stateful device within someone's network mishanding the connection state, like the author guesses. It's interesting that your side thinks the three-way handshake worked, but the remote side continues to resend the [SYN, ACK] packets, as if they've never received the final [ACK] from you. Had a hellish time troubleshooting a similar problem several years ago with F5 load balancers - there was a bug in the hashing implementation used to assign TCP flows to different CPUs. If you hit this bug (parts per thousand), your connection would be assigned to a CPU with no record of that flow existing, so the connection would be alive, but would no longer pass packets. Would take a long time for the local TCP stack to go through its exponential retries and finally decide to drop the connection and start over . |
|
We diagnosed the same(ish) bug in first generation F5 LBs in the 90s[1]. Figured exhaustive testing for this would have been SOP by now.
[1] To be fair, almost all 1st gen LBs had at least one major "send the packet to the wrong place and the state table gets screwed up" bug.