Hacker News new | ask | show | jobs
by rzezeski 1792 days ago
> During those tests, we noticed the machines were randomly freezing after some time, so we decided to upgrade the firmware of the network cards,

Reminds me of the various i40e Tx freezes I debugged while at Joyent. Granted, this is the illumos driver, not Intel's, but basically there were issues with the programming guide that I had to figure out the hard way. The 700-series controllers have not been the easiest to work with.

https://smartos.org/bugview/OS-7492 [Tx freeze when b_cont chain exceeds 8 descriptors]

https://smartos.org/bugview/OS-7457 [i40e Tx freezes on zero descriptors]

3 comments

This 8 descriptor per packet limit is HORRIFIC. I debugged and fixed this issue on FreeBSD when we first moved to the new iflib based ixl (i40e's name on FreeBSD).

They had a routine (ixl_tso_detect_sparse()) which was AFU. I wrote a userspace unit test that proved it was AFU, and then fixed it. I fed them back the routine & the unit test, and they hilariously left my commented debug prints in the routine.

https://github.com/freebsd/freebsd-src/blob/412b5e40a721430a...

And their 100GbE NIC has the same limit, which is just so sad. All these fancy features, and they cannot handle 8 segments per emitted packet on the wire.

AFU?
all fucked up
It was the same with Intel's drivers. We had the same issues happen on ESXi servers with X710 NICs, with VMware's repackaged and then Intel's original i40e driver, and it worked terribly, either kernel panicking or just freezing the NICs. It was a fun one to debug, but thankfully we only had to wait a few months ( the issue was known for a year) for Intel to come up with the fixed driver.

The bastards at VMware kept the buggy driver on their hardware compatibility list and kept shipping it for multiple versions probably a year later.

Was the ixgbe driver not available?
IIRC the options were i40e and i40en, the latter resulting in daily crashes, so i40e it was :)
From your first one:

Malicious Driver Detection

My reaction upon reading that line was "WTF." I haven't touched NIC drivers beyond the classic NE2000s, common Realteks, and the Intel 8254x, but it seems strange to have some sort of... antimalware feature in a NIC? Reminds me of the old BIOSes with "boot sector antivirus".

Probably more to do with the fact that everything is moving towards virtualization. Oftentimes these NICs dole out VFs directly to VMs via SR-IOV, in which case I imagine the NIC controller has some safeguards to keep the host and the rest of the guest's safe from denial-of-service and other attacks from a malicious guest driver.