Hacker News new | ask | show | jobs
by jsnell 3737 days ago
Are you sure your summary of that thread is really accurate?

To me it reads as Chris providing an exceptionally detailed bug report (including the exact code paths triggering the problem, and statistics from lockstat and dtrace on the lock in question). Nobody in the thread asks for more information (why would you want a "crash dump" for a non-crashing bug anyway?). Everyone seems to agree that the drivers are in fact taking spinlocks for long periods of time, while holding other locks. Nobody talks about "hardware known to be bad". What is talked about is how it's been too long since the drivers were last synced with upstream.

1 comments

It is detailed, but in all the wrong ways: instead of describing the problem that he's seeing and offering data, he has jumped to a code path that he believes is inducing it -- without much in the way of supporting evidence. And yes, he talks about bad HW ("access to the second port on the card currently fails to acquire swfw sync"). The ensuing discussion is more of a desultory wandering than it is a deliberate investigation into his problem -- which isn't surprising, because he hasn't described a problem but merely an observed artifact in the system. (Long lock hold times can easily be misleading; when exploring latency bubbles, one needs to be very careful about tying observed behavior to the latency outliers, lest one discover problems without discovering "the" problem.)

So yes, I stand by my summary of the thread.