| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by myrandomcomment 2218 days ago
	What are you trying to achieve here? What tools are you going to get and how do you want to use them? You do understand that the switching ASIC is just a PCI device right and that you cannot just pump all its bandwidth into the CPU for review? The path between the data plane (ASIC) and control plane (CPU) is limited, generally only a few gigs in today’s high end switches. Anything you want to do in the data plane has to be programmed on the ASIC. The only packets that are punted to the CPU are control plane are ones that need software processing and are low bandwidth, such as LLDP, STP, BGP control, etc. This is done by programming a switch ASIC table call “my station” or “l2 user”. On some kit you can tcpdump a front panel port to the CPU but it is rate limited as you can kill the CPU or stop processing of vital control plane packets (let’s DDOS the STP process, fun). Looking at traffic flow on the CPU on a 32x100g is not gong to happen. You need to sample, so sFlow, Netflow, etc. So given the limited bandwidth and any tools need to know how to translate your Linux configuration into Ethernet ASIC pipeline programming what is it you want to do that you cannot do today? Random note. I worked at a switching startup (a few). At one we always ran own latest code. After an update to a core switch everything looked good, but then people started to complain things where very slow. Went looking. Switch looked fine but dropping traffic towards CPU which should not happen. In checking the cacti graphs for that switch (10 second polling) all the graphs that showed the ports between the different networks were exactly the same flat line at a max of 134MB/s on 10G pots. Hum, strange. Hold on, that sounds like the max BW between the ASIC and the CPU port! Let check some bits in the ASIC configuration. Yup. New build forgot to set HW routing on in the pipeline so every packet was punted to the CPU for route processing. Lucky control plane policy had the STP etc, packets at a higher queue. Tweak the bit, blam, graphs go to 11 :) File bug.

1 comments

teleforce 2218 days ago

I think it is a shame and a mistake that PC industry has chosen PCI-Express over the battle tested Infiniband technology as the upgrade for the PCI [1]. Infiniband offers native channel based peer-to-peer connection fabric for disparate nodes and most of of the important CPU bottleneck tasks (e.g. memory protection & address translation) can be outsourced to the Infiniband controller instead of the proprietary ASIC networking controller.

The bottleneck is not only affecting networking but GPU industry as well. That's probably the main reason why Nvidia bites the bullet and bought major Infiniband player Mellanox for close to USD7 Billions deal. The bottleneck is only just bearable for video and games but not when you have to scale the processing of big data AI and machine learning applications.

[1] https://www.mellanox.com/pdf/whitepapers/PCI_3GIO_IB_WP_120....

link