|
|
|
|
|
by bcantrill
3732 days ago
|
|
Yes, and the "big and gnarly" issues that he alludes to are in fact a driver issue that has been seen only by him and brought up exactly once by him on the mailing list -- and that was a year and a half ago.[1] There was lots of discussion at the time, the conclusions being that (1) he was advocating changes that were deemed unsafe and (2) that his most serious problems were seen on hardware known to be bad. The driver that he's referring to (ixgbe) is in very widespread production on illumos (albeit likely more frequently over optics than the copper that he has deployed); to the degree that there's a driver issue here at all (and that's not a foregone conclusion!), it seems likely that there is something specific to his environment that is inducing it. Certainly, with no one else seeing the issue and without better information from him (e.g., a kernel crash dump that indisputably shows an ixgbe-level issue), it's hard to see how anyone could expect any real progress to be made on this issue -- illumos or otherwise, open source or otherwise. tl;dr: This in no way represents the "limits of open source" -- but it does highlight the limits of relying on other people to magically solve your problems for you. [1] https://www.listbox.com/member/archive/182179/2014/10/search... |
|
To me it reads as Chris providing an exceptionally detailed bug report (including the exact code paths triggering the problem, and statistics from lockstat and dtrace on the lock in question). Nobody in the thread asks for more information (why would you want a "crash dump" for a non-crashing bug anyway?). Everyone seems to agree that the drivers are in fact taking spinlocks for long periods of time, while holding other locks. Nobody talks about "hardware known to be bad". What is talked about is how it's been too long since the drivers were last synced with upstream.