Hacker News new | ask | show | jobs
by ternaryoperator 4703 days ago
I'm curious why compiler testing appears to be so hard. It seems to me that: Given this C input, this AST should be built, and on this platform, this code should be generated. This should be testable through automated scripts and numerous test cases could be created for new features resulting in huge regression suites. Am I missing something, or is it the effort of putting together such tests that's the problem?
4 comments

gcc does have a huge test suite.

The problem is if you combine all the various flags that affect the compiler, across all the architectures, across all the platforms, in all its variants (cross compiler, native, the many handful of libc and barebones variants) you're looking at too many tests to run no matter how huge an infrastructure you have to run it.

Another problem is that optimization depends a lot of context, given the amount(basically infinity) of C code that could surround any other piece of C code and affect the result - it's quite a hard task.

One interesting approach is csmith ,http://embed.cs.utah.edu/csmith/, that generates random C programs and look for bugs.

Their PLDI paper, "Finding and Understanding Bugs in C Compilers", is an amazing read: http://www.cs.utah.edu/~regehr/papers/pldi11-preprint.pdf
> The problem is if you combine all the various flags that affect the compiler, across all the architectures, across all the platforms, in all its variants (cross compiler, native, the many handful of libc and barebones variants) you're looking at too many tests to run no matter how huge an infrastructure you have to run it.

As someone who at one point in time maintained such a compiler test system, I'll say that it isn't possible to get all combinations, but you can hit a reasonable percent of them.

A good compiler test run end up running through millions of tests. It isn't for the feint of heart, but it is perfectly doable.

Producing a compiler that is robust on a single platform is relatively straightforward. You just bash away for a while; with wide usage the corner cases will be exposed and you can converge to a finished system.

Add multiple platforms, and it gets more difficult, because each platform has quirks and peculiarities you don't know about until you get a bug report.

Add optimisation of languages that are difficult to automatically reason about, and it gets harder again.

Multiply the optimisation difficulties by multi-platform difficulties and yes: it is hard.

No, as optimisations change both the AST and the generated code...
The problem is the optimization phase. Without optimizations bugs are extremely rare.