Hacker News new | ask | show | jobs
by cesarb 1387 days ago
At a previous company I worked at, we had an issue with our software (Windows-based, written in a proprietary language) randomly crashing. After some debugging, we found that this happened whenever the user made some specific actions, but only if, in that session, the user had previously printed something or opened a file picker. The culprit was either a printer driver or a shell extension which, when loaded, changed the floating point control word to trap. That happened whenever the culprit DLL had been compiled by a specific compiler, which had the offending code in the startup routine it linked into every DLL it produced.

Our solution was the inverse of the one presented in this article: instead of wrapping our routines to temporarily set the floating point control word to sane values, we wrapped the calls to either printing or the file picker, and reset the floating point control word to its previous (and sane) value after these calls.

5 comments

Had to deal with this same issue when I had a program supporting plugins, DLLs compiled with Delphi would turn on all the floating point traps. Took a while to track down what was causing FP faults in comctl32.dll. It got so bad that I had to put in a popup dialog that would name and shame the offending DLL so the authors would fix their broken plugins. It's an ABI violation in Windows since the ABI specifically defines FPU exceptions as masked, so this was more egregious than just turning on FTZ/DAZ (which Intel-compiled DLLs did).

Many of these same DLLs would also hijack SetUnhandledExceptionFilter() for their custom exception support, which would also result in hard fastfail crashes when they failed to unhook properly. Ended up having to hotpatch SetUnhandledExceptionFilter() Detours-style to prevent my crash reporting filter from being overridden. Years later, Microsoft revealed that Office had done the same thing for the same reasons.

The new version of this problem is DLLs that use AVX instructions and then don't execute a VZEROALL/VZEROUPPER instruction before returning. This is more sinister as it doesn't cause a failure, it just causes SSE2 code to run up to four times slower in the thread.

I was interested in the last point about AVX instructions, and found https://john-h-k.github.io/VexTransitionPenalties.html which discusses the problem.
Yep, I've encountered floating point flag incompatibilities when dynamically loading Borland-compiled libraries into Visual Studio compiled applications, as well as when using C++ code via Java Native Interface.

It is nice that diverse vendor-specific calling conventions and ABIs are less common these days.

You could also get an issue with x87/MMX where floating point code wouldn't work if you wrote some MMX code and didn't do an `emms` instruction afterward.

This is basically the reason compiler autovectorization doesn't do MMX.

That is one hell of a war story - I didn't realize that kind of failure was even possible, but it is truly terrifying.
Direct3D used to flip the x87 FPU to single precision mode by default. This produced some amazing bugs when your other C libraries reasonably assumed that a double would be at least 64 bits. (The FPU mode settings affected the thread that called Direct3D, and most programs used to be single-threaded.)

It seems they changed this behavior in Direct3D 10:

https://microsoft.public.win32.programmer.directx.graphics.n...

I stumbled into this bug in a rather spetacular manner.

I was making a game using D3D, Lua and Chipmunk physics, and some of the behaviour of the game was being odd.

So I started to try printing random stuff with Lua, eventually I just tried: print(5+5), and to my surprise my console outputted "11".

I went into Lua's irc channel to talk about this, and everyone said I was nuts, that the number was too small to trigger precision issues, that I was a troll and so on.

After a lot of searching I found out about this D3D bug, so I switched the game to use OpenGL instead there it was, 5+5 = 10 again!

Now why fiddling with the FPU could make 5+5 become 11, I have no idea.

If it ends up as 10+epsilon after the loss of precision and for some reason the fp round mode is set to FE_UPWARD ... And some part of the Lua stack recognizes that 5+5 is an integer-yielding expression and casts it as one for the display...
Had this exact same problem. It was a specific color inkjet driver doing this, my guess is to enable dithering or something similar. It’s one of those things that infects everything in the code base because the way you print with GDI is to progressively draw parts of the page - so you have to call in and out of code that talks to the printer DC. We also had to render one item using Direct3D retained mode and that added to the fp control word complexity. Things seemed to be more robust on NT based OSes.
Kernel drivers present even more fun options for corrupting CPU FPU and SIMD state. IRQs etc. only preserve integer scalar register values.

Of course that's why floating point math is mostly a big no-no in driver code. Unless the driver preserves and restores FPU state on its own.

I've heard so many stories akin to this one that I just shake my head. It's a self-inflicted wound that people who prioritize performance above other considerations keep inflicting on everyone else.

I hope we learned our lessons on this specific question in the design of Wasm. There are subnormals in Wasm and you can't turn them off for performance.

I think the problem is libraries implicitly affecting code outside the library. This time has been related to optimization of floating point operations, next time it will be other thing. Why bother having lexically scoped languages if the real behavior is dinamical? Debugging this kind of error is very hard
Agreed. Side-effects to global state is generally bad. It would have been not as bad to introduce the FTZ mode in a way that wasn't global state, but alas, the performance mode itself was the original sin. There is apparently zero overhead for subnormals on PPC and very little on arm. It's always been Intel pushing this crap because of their FPU designs' shortcomings.