Hacker News new | ask | show | jobs
by IgorPartola 483 days ago
I am very curious, if these bugs are that common then why don’t we see more programs with weird bugs when they are running and especially having them be documented? Is it because when an unknown bug turns out to be a compiler bug and not a code error it gets fixed right away and with little fanfare? Or that there is some sort of resiliency built into the compiled code that can mask compiler bugs? Or is there some other factor?

Also how easy is it do discover a compiler bug and how easy is it to identify that a bug in your executable is due to a compiler bug?

3 comments

Compilers runs enormous regression suites, and CI/git/bisect/etc style of development has made bugs harder to check in and quicker to squash in a lot of cases I would say.

I have found a number of compiler bugs in GCC and LLVM (and GAS and LLMV AS). Almost without fail they have been in the use of new features (certain new instructions, new ABI / addresing model) or esoteric things (linker script trickery, unusual use of extended inline asm) etc where the compilers had probably no or very little "real" code to test against other than presumably some simple things and basic unit tests when they check in said features.

Unless you're doing _really_ unusual things, or exercising new paths that don't just get picked up when compiling existing code (e.g., like many/most optimizations would), it's just not that likely you'll write code that triggers some unique path / state that has a noticeable bug.

To identify the bug is a compiler bug that is silent bad code generation, you basically assume the compiler is correct until you start to narrow the problem down to a state which should be impossible. After you put in enough assertions and breakpoints and logging (some of which might make the problem mysteriously go away) and reach the point of banging your head on the table, you start side-eyeing the compiler. If you know assembly you might start looking at some assembly output. Or you would start trying to make an reduced reproducer case. E.g., take the suspect function out on its own and make some unit tests for it. A tool like C-reduce can sometimes help if it's not a relatively simple small function.

How quickly you reach that point where you can actually start to narrow down on a possible compiler bug entirely depends on the problem. If it's causing some memory ordering or race condition or silent memory corruption that is only detected later or can only be reproduced at a customer sporadically, then who knows? Could be months, if ever. Others could be an almost immediate assert or error log or obvious bad result that you could debug and file a bug report in a day.

A significant factor in my experience is that a lot of programs are quite similar from an compiler perspective: they use well-trodden set of features and combine then in a predictable way. Compiling those regular programs is well-tested and well-understood. Compiler bugs tend to be relegated on the exotic paths, when using language features in novel and interesting ways.
Large functions is a particular breeding ground.

Ages ago working on PS2 games one of our guys had a particularly huge "do-animations-and-interpolations-and-state-and-everything-for-the-hero-in-one-huge-switch" thingy (not uncommon to encounter in games) that crashed the GCC, the function was split up.

In the sequel I think a similar function grew enough that not only had they the function but also split in multiple files to avoid miscompiles.

Most recently I was generating an ORM binding(C#) from the database model of an ERP system, for mysterious reasons the C# runtime was crashing without stacktraces,etc (no debugger help). Having seen things like this before I realized that one of the auto-generated functions was huge so I split it up in multiple units and lo-and-behold it worked.

(Having written a tiny JVM once I also remembered that jump instructions are limited to 64kb, not 100% if the .NET runtime inherited that... once it worked I didn't put any effort into investigating the causes).

Most of the time though compiler bugs aren't the worst (unless they help cause confusion in already hard scenarios).

> I am very curious, if these bugs are that common then why don’t we see more programs with weird bugs when they are running and especially having them be documented?

Any given program has N "native" bugs and M bugs introduced by the compiler. I think as long as N >> M you won't really notice. Even if you stumble across a compiler bug by chance, proving it is a nightmare: there's so much UB everywhere that any possible output is technically correct. Exceptions are compiler crashes but those are rare.

In my experience most of compiler bugs were found by well-tested and proven software during the update of the compiler version or switching compilers. That kind corresponds to the prerequisite of "N is small".