Hacker News new | ask | show | jobs
by scottlamb 6 days ago
Ugh, yeah, gross for `thunderbolt-net` only support one link in total, though presumably fixable.

> - Intuition on why- I can't point you to the line number, but I think it has to do with a fixed 4kb page size when communicating with the NHI that ends up becoming a bottleneck, perhaps 16kb pages on aarch64 apple help here?

I'm used to page size making a difference (due to TLB pressure) but not a factor of 2. I'm not familiar with DMA, so maybe there's some reason it'd be that dramatic there, but I'm unsure.

If the total size vs the latency of draining is just so small that it frequently fills and stalls, or if the sender and receiver can't be accessing it at once (but I don't think should be true?), it might make more sense. I think if I were wanting to make this thing go more smoothly, I'd probably start by measuring fractions of the time the tx/rx buffers are completely empty and completely full.

Actually, I'm not sure I'm understanding the text "we only have a single DMA ring for tx and rx" either. Does that mean one for tx and one for rx? or really one ring in total? if the latter, does it have to say drain fully before switching modes? that would seem pretty crippling.