Great writeup, and also thoroughly answers the first question that popped into my mind: "how on earth could a bug in the Linux network stack that causes the whole data transfer to get stuck stay undiscovered for so long?"
"Most applications will care about network timeouts and will either fail or reconnect, making it appear as a “random network glitch” and leaving no trace to debug behind."