Hacker News new | ask | show | jobs
by jcranmer 483 days ago
Compilers are multimillion line programs, and they have an error rate which is commensurate with multimillion line programs.

That said, I think like half the bugs I see get filed against the compiler aren't actually compiler bugs but errors in user code--and this is already using the filter of "took the trouble to file a compiler bug." So it's a pretty good rule of thumb that it's not a compiler bug, unless you understand the compiler rules well enough to articulate why it can't be user error.

2 comments

It's not quite half the bugs on GCCs bug tracker, but it's very high: https://gcc.gnu.org/bugzilla/report.cgi?x_axis_field=&y_axis...

It's around 10% invalid bugs and another 10% duplicates. A lot of them that I've seen, including one of mine, are a result of misinterpreting details of language standards.

Compilers have a huge advantage over other programs: they are fully deterministic since they depend only on input files, command line arguments and few environment variables. It makes bugs easier to reproduce and fix compared to interactive applications, programs with networking, multi-threading...
Pretty sure most modern compilers are multithreaded, and do exhibit a slew of practical nondeterminisms, which is how/why projects like Reproducible Builds were formed.
In general, most compilers are generally single-threaded for most of the compilation process--at the very least, compiling a single file (translation unit) is almost always done using just one thread.

However, nondeterminism does creep in in various places in the compiler. Sorting an array by pointer value is an easy way to get nondeterminism. But the most common nondeterminism in a build system comes not from the compiler but the filesystem--"for file in directory" usually sorts the file by inode, which is effectively nondeterministic across different computers.

Yes, that's why I was so careful with the wording. Timestamps are another example.