Hacker News new | ask | show | jobs
by joosters 720 days ago
Is this really C++ specific though? It seems like the optimisations are happening on a lower level, and so would 'infect' other languages too.

Whatever the language, at some point in performance tweaking you will end up having to look at the assembly produced by your compiler, and discovering all kinds of surprises.

1 comments

LLVM isn't perfect, but the problem here is that there's a C++ compiler flag (-ffast-math) which says OK, disregard how arithmetic actually works, we're going to promise across the entire codebase that we don't actually care and that's fine.

This is nonsense, but it's really common, distressingly common, for C and C++ programmers to use this sort of inappropriate global modelling. It's something which cannot scale, it works OK for one man projects, "Oh, I use the Special Goose Mode to make routine A better, so even though normal Elephants can't Teleport I need to remember that in Special Goose Mode the Elephants in routine B might Teleport". In practice you'll screw this up, but it feels like you'll get it right often enough to be valuable.

In a large project where we're doing software engineering this is complete nonsense, now Jenny, the newest member of the team working on routine A, will see that obviously Special Goose Mode is a great idea, and turn it on, whereupon the entirely different team handling routine B find that their fucking Elephants can now Teleport. WTF.

The need to never do this is why I was glad to see Rust stabilize (e.g) u32::unchecked_add fairly recently. This (unsafe obviously) method says no, I don't want checked arithmetic, or wrapping, or saturating, I want you to assume this cannot overflow. I am formally promising that this addition is never going to overflow, in order to squeeze out the last drops of performance.

Notice that's not a global flag. I can write let a = unsafe { b.unchecked_add(c) }; in just one place in a 50MLOC system, and for just that one place the compiler can go absolutely wild optimising for the promise that overflows never happen - and yet right next door, even on the next line, I can write let x = y + z; and that still gets the kid gloves, if it overflows nothing catches on fire. That's how granular this needs to be to be useful, unlike C++ -ffast-math.

You can set fast math (or a subset of it) on a translation unit basis.
Because the language works by textual inclusion and so a "translation unit" isn't really just your code this is much more likely to result in nasty surprises, up to and including ODR violations.
Yes, if you try hard enough I'm sure you can find ways to screw up.

From a practical point of view it is fine.

That's not really true. If you link a shared library that was compiled with -ffast-math, that will affect the entire program. https://moyix.blogspot.com/2022/09/someones-been-messing-wit...
To enable this "feature" I believe you have to specify fast math while linking the .so, it is not enough to do it when compiling.

It had also been fixed recent GCC versions.

Or even on a per function basis, at least with gcc (no clue about clang...)
In principle yes, you can use Attribute optimize, but I wouldn't rely on it. Too many bugs open against it.
It's worked for me in the past but maybe I got lucky.
LLVM actually also supports instruction-level granularity for fast-math (using essentially the same mechanism as things like unchecked_add), but Clang doesn't expose that level of control.
clang does have pragma clang fp to enable a subset of fast math flags within a scope
If you're using floating point at all you have declared you don't care about determinism or absolute precision across platforms.

Fast math is simply saying "I care even less than IEEE"

This is perfectly appropriate in many settings, but _especially_ video games where such deterministic results are completely irrelevant.

Actually, floating-point math is mostly deterministic. There is an exception for rounding errors in transcendental functions.

The perception of nondeterminism came specifically from x87, which had 80-bit native floating-point registers, which were different from every other platform's 64-bit default, and forcing values to 64-bit all the time cost performance, so compilers secretly turned data types to different ones when compiling for x87, therefore giving different results. It would be like if the compiler for ARM secretly changed every use of 'float' into 'double'.

The existence of a theoretically pure floating-point behavior doesn't have any impact on the reality of the implementations we program for.
> This is perfectly appropriate in many settings, but _especially_ video games where such deterministic results are completely irrelevant.

I'm not sure I'd agree. Off the top of my head, a potentially significant benefit to deterministic floating-point is allowing you to send updates/inputs instead of world state in multiplayer games, which could be a substantial improvement in network traffic demand. It would also allow for smaller/simpler cross-machine/platform replays, though I don't know how much that feature is desired in comparison.

Indeed, but this isn't hypothetical. A large fraction of multiplayer games operate by sending inputs and requiring the simulation to be deterministic.

(Some of those are single-platform or lack cross-play support, and thus only need consistency between different machines running the same build. That makes compiler optimizations less of an issue. However, some do support cross-play, and thus need consistency between different builds – using different compilers – of the same source code.)

The existence of situations where the optimization is not appropriate, such as multiplayer input replay, does not invalidate the many situations where it is appropriate.
Sure, but I'm not sure anyone is trying to argue that -ffast-math is never appropriate. I was just trying to point out that your original claim that deterministic floating point is "completely irrelevant" to games was too strong.