Right... the parent is saying that 4x10G is pointless compared to 4x2.5G, because PCIe 4 lanes will top out at forwarding around 7gbps of traffic.
You can't do line rate on all ports either (limited by PCIe alone, let alone CPU for smaller packets), but you can certainly fill an individual port, which I suspect is the goal.
> because PCIe 4 lanes will top out at forwarding around 7gbps of traffic [...] limited by PCIe alone
Are you sure about that? With 5GT/s (or 500MB/s) per lane, and with 4 lanes, that should be plenty, no? Intel adapters like the x520-DA2 are specced at 2x 10G, and use PCIe 2.0 x8.
FWIW, I was also able to iperf3 around 3.7Gbps on a X520-DA2 connected to an RPi4's single-lane PCIe 2.0.
But PCIe is full duplex! With PCIe 2.0 x4 there's 4 lanes in each direction [1], so when 'forwarding' over a single 10G link you can expect to send and receive simultaneously at the speed I mentioned earlier.
Yah, I guess dividing by 2 isn't fair. But transmitting does impact receiving and vice-versa: when you're reading DMA descriptors, you need to wait/hold for posted completions, etc. It's not fully uncontended between send and receive, but more uncontended than a naive division by 2 would imply.
You can't do line rate on all ports either (limited by PCIe alone, let alone CPU for smaller packets), but you can certainly fill an individual port, which I suspect is the goal.