| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by desc 2321 days ago
	sigh If your application is slow, odds are it's not because you used exceptions. If you're throwing enough exceptions for this to matter it'll show up on a profiler, and then you can change that specific chunk of code to avoid treating that particular case as 'exceptional'.

4 comments

gameswithgo 2321 days ago

Tim Sweeney, creator of Unreal Engine, Epic Megagames, etc, recently had a code base where exceptions were causing problems even when not actively being thrown:

https://twitter.com/TimSweeneyEpic/status/122307740466037145...

Quoting him in a follow up tweet:

>They weren’t throwing and a disassembly showed no visible artifacts of exceptions. But turning off the possibility of throwing exceptions gained 15% and just made the assembly code tighter with no clear pattern to the improvements.

abainbridge 2321 days ago

How does he know that the performance improvement wasn't just a side effect of altering the alignment of his code? Here's a guy getting a 48% performance improvement just by randomizing the alignment of his functions: https://youtu.be/Ho3bCIJcMcc?t=351 or slides if you prefer https://github.com/dendibakh/dendibakh.github.io/blob/master...

gameswithgo 2321 days ago

I am very familiar with this effect. We could of course use effects like this to cast doubt on any benchmark someone has ever run, unless they specifically mention that they tested for this, and 100 other bench marking gotchas. We can, if we like, assert that all things are unknowable, while at the same time asserting that the thing we want to be true is for sure true.

However, it appears to be a rather commonplace occurrence that having exceptions on, even if you aren't using them, can cause performance problems, it isn't just a one of in Tim's case. Also Tim has quite a bit of experience working on bleeding edge C and C++ performance code so there is a good chance he did account for this. You can ask him.

int_19h 2321 days ago

How common is it, actually? And on what platforms/architectures? It was a common problem back in the day when most code was compiled for x86, since exceptions weren't designed to be zero-cost in that ABI.

For what it's worth, the article itself has this bit:

"Thanks to the zero-cost exception model used in most C++ implementations (see section 5.4 of TR18015), the code in a try block runs without any overhead."

abainbridge 2321 days ago

It has been at least a 10% effect in the last two things I profiled, which were a simple software rasteriser and a distributed key value store. The other significant benchmarking gotchas I see are: 1) the CPU being in a sleep mode when the test starts and takes a while to get running at full speed, and 2) other stuff running on the machine causing interference. But these two are easy to work around compared to the alignment-sensitivity problem.

abainbridge 2321 days ago

I didn't mean to disagree with the conclusion. My point was more that it is hard to be confident in the causes of results like this. It'd be great if we had tools that could randomise alignments etc, so we could take lots of samples and get more confidence. As far as I know those tools don't exist and we just have to use experience.

int_19h 2321 days ago

This sounds like a straight up optimizer bug, and should be reported as such.

And if profiling does show up a case like that, the appropriate response to that is slapping noexcept on the function, not disabling exceptions globally.

desc 2321 days ago

Indeed, and this was measured, and the improvement was worth it. Profiling! :P The cost isn't always where we think it is.

Still, 'turn off exceptions because exceptions are slow' is a daft rule of thumb for the majority of software, where the slowness probably has more to do with choice of data structures, etc than compiler/platform implementation of language features.

Always measure first, last, and in between.

gpderetta 2321 days ago

I think that in theory there should be no difference between code that can throw and that can't from the point of view of the optimizer. In practice I think that sometimes some code motion passes are disabled in the presence of abnormal control flow because the compensation code that would be otherwise required becomes unwieldy and hard to prove correct.

lovecg 2321 days ago

Some case might be “exceptional” in the sense that it doesn’t happen most of the time (and thus doesn’t show up on regular profile checks), but when it does happen the failures are highly correlated and suddenly you’re throwing thousands of exceptions per second all over the place.

This is also the time one finds out that the exception handling code on typical C++ runtimes take out a global lock and the multithreaded application grinds to a halt—the raw CPU cost of an exception is not the most pressing problem at this point.

AnimalMuppet 2321 days ago

Wait, what? An exception takes out a global lock?

Do you know which runtimes do this? Or, which runtimes don't?

gpderetta 2321 days ago

libstc++ does. And IIRC the reason is the usual culprit: dlclose [1].

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71744

BubRoss 2321 days ago

malloc takes out a global lock and performance doesn't 'grind to a halt' from a single allocation. Hundreds of thousands of allocations per second per core and concurrency will suffer, but most programs either don't actually have nearly enough concurrency or minimize their allocations to the point where it isn't a primary bottleneck.

throwaway17_17 2321 days ago

I think the context you’re working in matters heavily with such an assertion. In fact, it is quite common in game development to bulk allocate and then use custom allocators within these regions specifically avoid the drag on performance and determinism inflicted by malloc.

BubRoss 2321 days ago

I don't think this confronts anything I said.

Bulk allocations with custom allocators would either be exactly what I said (low malloc calls) or would just not use malloc at all and use the system memory mapping functions.

The larger point was that an exception being thrown and taking a global lock is not going to tank performance. That would imply that all other threads are trying to take the same lock at the same time and that the thread that gets it holds it for a long time.

Even in the case of malloc, where there could be actual heavy lock contention, this is not always a bottleneck.

jcelerier 2321 days ago

> malloc takes out a global lock

not anymore for a couple years with glibc : https://www.phoronix.com/scan.php?page=news_item&px=glibc-ma...

lovecg 2321 days ago

Of course, it would depend on the workload. An exception here and there wouldn’t matter, but for a somewhat contrived example, using exceptions for something like a bad connection to a database on a server with dozens of threads could easily turn into a concurrency and a resource exhaustion problem (where every thread starts seeing the same error at the same time).

zabzonk 2321 days ago

> malloc takes out a global lock

Says who? The C Standard says nothing of the kind.

BubRoss 2321 days ago

All modern common implementations (I think). MSVC's malloc definitely locks. I think gcc, clang and icc do too. This is a big reason why tcmalloc and jemalloc were created.

gpderetta 2321 days ago

I don't know about MSVC, but glibc malloc has per thread caches. At some point you need to hit the global allocator or sbrk or mmap of course and that might take a global lock.

abainbridge 2321 days ago

Yeah, it would be interesting to see the overhead exceptions add when no exception is thrown.

abainbridge 2321 days ago

Here's a godbolt link: https://godbolt.org/z/Z6vYAS

Looking at the disassembly the machine code is ~2x the size for the exception versions, but most of it is on the cold path.

The exception version has a conditional branch to do the "== errorInt" part. The non-exception version manages to avoid the conditional branch by using a conditional move, which would avoid a pipeline stall on a branch mis-prediction.

Edit: I think this disproves desc's point ("If your application is slow <because of execptions> it'll show up on a profiler"). ie there's probably a small cost to exceptions even when they are not taken and it will be spread across your entire program and will not show as a single spike in a profiler.

gpderetta 2320 days ago

> The non-exception version manages to avoid the conditional branch by using a conditional move, which would avoid a pipeline stall on a branch mis-prediction.

branches are usually superior to conditional moves for predictable conditions as they break dependency chains. In case the exceptional code path is taken, the cost of the misprediciton is dwarfed by the cost of unwinding the stack.

This is interesting actually, the fact that the compiler uses a conditional move in the error checking case could mean that the compiler has no useful branch probability model for that branch in the error checking case, but even when using __builtin_expect, the compiler still prefers the conditional move.

abainbridge 2320 days ago

> branches are usually superior to conditional moves for predictable conditions as they break dependency chains.

Interesting, not heard that before. Do you know of somewhere I can read about this?

gpderetta 2320 days ago

Agner Fog is the usual go-to reference. For this specific case, you can also google any of Linus rants on conditional moves (they used to be very high latency, although today they are not so much of an issue). This one for example: https://yarchive.net/comp/linux/cmov.html

ncmncm 2320 days ago

It is complicated to describe when cmov is slow and when it is fast. As a rule of thumb, if the next loop iteration data operations depend on a cmov in this one, and around, cmov will be slow. If not, it is very, very fast. Use of cmov can make quicksort 2x as fast.

Gcc absolutely won't generate two cmov instructions in a basic block. Clang, for its part, abandons practically all optimization of loops that could conceivably generate a throw.

abainbridge 2320 days ago

Nice. Like every other topic, there's more complexity if you keep looking harder.

abainbridge 2321 days ago

The problem with benchmarks is that I never see any that estimate the impact of the extra code size on programs the size of, say, Photoshop. It takes annoying long to load such a program. Is code size part of that problem? Probably. Is the bloat added by exceptions significant? I'd like to know.

ncmncm 2321 days ago

When it takes a program too long to load, it is because the program is doing too much non-exception work. The exception-handling code is not even being loaded unless it's throwing while it loads, which would just be bad design.

abainbridge 2321 days ago

I think the exception code _is_ loaded. It is only a theoretical possibility that loading it could be avoided.

I just built the following code with g++ v7.4 (from MSYS64 on Windows):

    #include <math.h>
    #include <stdexcept>

    void exitWithMessageException() {
        if (random() == 4321)
            throw std::runtime_error("Halt! Who goes there?");
    }

    int main() {
        exitWithMessageException();
        return 1234;
    }

The generated code mixed the exception handlers with the hot-path code. Here are the address ranges of relevant chunks:

    100401080 - Hot path of exitWithMessageException
    100401099

    10040109a - Cold path of exitWithMessageException
    10040113f

    100401140 - Start of main

gpderetta 2321 days ago

Interesting, GCC 7.x seems to simply puts the cold branch on a separate nop-padded cacheline.

GCC 9 [1] instead moves the exception throwing branch into a cold clone of exitWithMessageException function. The behaviour seems to have changed on starting from GCC 8.x.

[1] https://godbolt.org/z/PKKZ8m

nurettin 2321 days ago

Code paths introduced in order to execute any potential stack unrolling are inefficient and they make your code slow. Especially tight loops. This was common knowledge back in 2000s.

ncmncm 2320 days ago

Common knowledge, but not correct. Code to destroy objects has to be generated for regular function returns, and is jumped into by the exception handler too. Managing resources by hand, instead, would also require code, but you have to write it. Its expense arises from its fragility.

nurettin 2319 days ago

What I meant was inserting stack frames into assembly, which are dissimilar to calling free, slowing things down.

lallysingh 2320 days ago

But the relevant comparison is the cost of exception handling vs the cost of manual error checking.

Of course if you don't check for or otherwise handle errors, the program will be faster. It's literally doing less work.

abainbridge 2320 days ago

OK, another Godbolt link: https://godbolt.org/z/SiRvBR

This one adds functions that call the exception-based and error code based functions in a simple for loop. Both handle the error.

Unless I've screwed up somewhere, I think the result is that in the exception case, the body of the inner loop contains 13 instructions, while the error code case contains 5.

Also, the generated code for the exception case is harder to read and understand. When writing performance critical code I like to eye-ball the disassembly just to make sure the compiler didn't do anything unexpected. This task is hard enough already in non-trivial functions, I certainly don't want it getting any harder.

munk-a 2321 days ago

Also generally having slow code is less a problem then expensive code - exceptions can allow you to use more succinct expression that will lower long term maintenance costs - the full lack of exceptions is one of the things that I think continues to impair Go which has a lot of other neat ideas (and a really good way to deal with sub-addressing in the form of slices).

zozbot234 2321 days ago

Doesn't Go have panic-recover which is basically the same deal?

munk-a 2321 days ago

Panic/recover are much more limited and while you can build an exception system from them (much like all turing complete things can be all other turing complete things) it is a pain and extremely inefficient. So panics do exist and are used for I/O errors sometimes but it's quite inconsistent.