Hacker News new | ask | show | jobs
by yardstick 1883 days ago
Is a protocol change necessary here? Keep alives are already sent... and they would be held up if the TCP window hit 0? At which point the BGP/TCP session can be terminated and re-established.
4 comments

BGP Keepalives are not request-reply, they are simple scheduled transmissions. Which means even if one side is not reading, it may still be sending keepalives. So the other side keeps the session open, despite its own keepalives sitting in its send queue.

Also, any valid BGP message resets the keepalive timer, so the reading side just needs to occasionally pop something off the full queue and process it. Which, say, if you're swapping to hell and back, can still get done. (Assuming it even has the scheduling get to killing things due to holdtime expiry. It might just not be expiring anything anymore for reasons of floating face-down in the river.)

I think the argument is if _your_ keep-alives are held up then currently you wait on _them_ terminating the session. If they are malicious or just not working well they may not do this.
You can see the window size is zero though, so I think GP is suggesting sending a TCP reset or something similar.

Maybe this isn’t a good option because it would have too many undesirable side effects?

The RFC proposes to change the BGP finite-state machine, not the protocol.
Like you I don't see how a change in protocol is requried, an update to the RFC to say something SHOULD time out the connection if the send window is zero. That said I haven't read the specs with a toothcomb and perhaps there's something about how you MUST NOT drop the connection if you're getting keepalives?

Get Cisco and Juniper to implement it and that's 75% of LINX covered at least, I assume other exchanges have similar equipment makeup.

It seems reasonable behaviour to me.

It doesn't prevent the problem of the malicious BGP peer of course, but we know that already - if they choose to ignore your messages (while being happy with a high send-window) but continue to send keepalives you're equally screwed.

If you don't put it in the RFC then you'll end up with five different solutions to this problem from five different vendors, and a nice 5x5 matrix of new hilarious edge cases when these are talking to each other and something wonky is happening to the TCP session.