Hacker News new | ask | show | jobs
by jcims 653 days ago
I probably shouldn't be commenting because I don't have any experience at this level, but given it's a closed system where they control supply and demand it seems they could manage away most congestion issues with scheduling/orchestration. They still have a primitive flow control in the protocol and it seems like you could create something akin to a virtual sliding window just by instrumenting the retransmits.

But now I am curious with the distribution of observed window sizes is in the wild.

Edit: I'd bet the simpler protocol is more vulnerable to various spoofing attacks though.

Edit2: Lol I hope the frame IDs are for illustrative purposes only - https://chipsandcheese.com/2024/08/27/teslas-ttpoe-at-hot-ch...

2 comments

In principle, with perfect knowledge of flows at any given instant, you can assign credits/rate-of-transmission for each flow to prevent congestion. But, in practice this is somewhat nuanced to build, and there are various tradeoffs to consider: what happens if the flows are so short that coordinating with a centralised scheduler incurs a latency overhead that is comparable to the flow duration? There's been research to demonstrate that one can strike a sweet spot, but I don't think it's practical nor has it been really deployed in the wild. And of course, this scheduler has to be made reliable as it's a single point of failure.

Such ideas are, however, worth revisiting when the workload is unique enough (in this case, it is), and the performance gains are so big enough...

Maybe the protocol could have arbitration built in? If one was clever you could actually have the front of the packet set a priority header, and build the collision detection/avoidance right into the header.

Multiple parties communicate at the same time? Lower number priority electrically could pull the voltage low, dominating the transmission.

That way, priority messages always get through with no overhead or central communication required.

Yep, such ideas have been around. But congestion is a fundamental problem. Admission control is the only way to ensure there is no congestion collapse.

The technical issue is that you would need global arbitration to ensure that the _goodput_ (useful bytes delivered) is optimal. With training across 32k GPUs and more these days, global arbitration to ensure the correct packets are prioritised is going to be very difficult. If you are sending more traffic than the receiver's link capacity, packets _will_ get dropped, and it's suboptimal to transmit those dropped packets into the network as they waste link capacity elsewhere (upstream) within the network.

> I'd bet the simpler protocol is more vulnerable to various spoofing attacks though.

This is a protocol between compute nodes in a data center, it's layer 2 so there is no way to reach this over the internet.

That's how it always starts :)

But, point taken.