| "Note: Plenty of people will bring up the Pentium FDIV bug here, but the reason we didn’t include it is simple: Despite being an enormous marketing failure for Intel and a huge expense, the actual bug was tiny." The fact that the fault was tiny and that few people were affected is definatly NOT the point. The so-called Pentium 'bug' was the result of fundamentally terrible engineering on Intel's part in that the underlying design wasn't fit for purpose - it wasn't just a bug. It seems to me the authors of this story do not understand the implications of what Intel did was fundamentally wrong in that its math processing was flawed by design from the outset or otherwise they would have included the Pentium in their list. In order to achieve increased math processing speed, Intel broke mathematics algorithms down into part algorithm and part lookup tables - that is instead of having mathematics algorithms complete the whole task (which is the logical way of doing things). If the mathematics algorithm were wrong then every calculation would also be wrong and thus the problem obvious from the outset. Adding a lookup table makes calculations faster but one would then have had to test every combination in the lookup table - and Intel didn't. Look at the problem like this - think of a set of log or trig tables, now think of the implications if one of those table entries is incorrect. What Intel did was deliberate cheating and it failed to get away with it. Intel would have known this from the outset and thus the problem was an integral design fault rather than a bug. Intel knowingly implemented a design that had flawed data integrity at its most fundamental level. What Intel did was so nasty that it's hard to think of how it could have made matters worse than if it had deliberately tried to introduce a fault. In my opinion, any company that would stoop to such low ethical tactics as Intel did with the Pentium's design would have demonstrated that it cannot be trusted - and I've never trusted Intel from that point onward. If anyone ever needs a reason for why processors should have open design architectures that are subject to third-party scrutiny then this is the quintessential example. |
There's a great writeup with the results of Intel's internal investigation [2], which outlines the challenge in testing production chips for this sort of bug. A key point:
> The fraction of the total input number space that is prone to failure is 1.14 x 10^-10.
So around 1 in 9 billion possible numerator/denominator pairs exhibit the bug. Testing 9 billion double-precision FDIV divides on a 60MHz Pentium would take almost four days, if my math checks out and the CPU could do 2.5 billion divides per 24 hours.
[1]: https://en.wikipedia.org/wiki/Division_algorithm#SRT_divisio...
[2]: https://users.fmi.uni-jena.de/~nez/rechnerarithmetik_5/fdiv_...