Hacker News new | ask | show | jobs
by TheNewAndy 817 days ago
I have made a (internal only unfortunately) meta build system for my work. Instead of being Turing complete is uses Horn Clauses as the primary mode of logic. This means that the problem of determining "is this compiler flag set" or "is this file used in this build" is always solvable in polynomial time.

It has been in use for over ten years by multiple teams around the world for all sorts of whacky toolchaims and projects.

I don't believe anyone has missed Turing completeness.

Despite all that, I agree that if you need a metabuild system then things are probably too complicated. I guess the main reason I would argue for one is when building a cross platform thing, it is nice if developers using toolchain X don't need to modify build files for all the toolchains that aren't X - making Linux developers have to fight Visual Studio just to add a new file to a build is not ideal. But in principle - I still agree that the resulting question of "how do I build this thing from scratch?" Should probably be answerable in 2 sentences (e.g. "Compile and link all the C files with the top level directory in the include path. Don't compile files that have an operating system in their name unless you are building for that os")

3 comments

Interesting. Can you give some details what you did there. What exactly is a horn clause modeling here?
Horn clauses just mean you can have a whole bunch of things saying "if x then y is true" where X is constrained to be a monotonic (i.e. no negation) Boolean function (i.e. OR and AND).

Given some starting known true values in the system (e.g. debug_build=true, platform_is_windows=true, build_curl_library=true) you solve to find the minimum set of other variables that need to be true.

Each variable can then have some effects tied to it (e.g. if build_curl_library is true, then make sure curl.c is compiled and curl/include is in the include path)

Being Horn Clauses, the solving is basically a flood fill - as soon as you know a variable is true, it is locked in as true. If there was negation, then you can have weird contradictions and reasoning about the system starts being really difficult.

Aren't Horn clauses enough to make a language Turing complete?
No, it is just a finite set of variables. The problem being solved is "given a known set of variables that are true, find the minimal set of other variables that need to be true" and this is basically a flood fill algorithm.
Could you elaborate on the "find the minimal set of other variables that need to be true" part? What other variables are you referring to? "need to be true" for what exactly?
The user of the build system defines boolean variables for any aspect of the build system that they think might be helpful. There are booleans for types of compiler flags (e.g. should I include debug symbols?) and booleans for specific chunks of code (e.g. should I build the unit test code) and booleans for how the thing is being built (e.g. am I building for linux? am I writing a makefile?)

The build system is then able to figure out a starting set of booleans which must be true (e.g. "I am writing the makefiles for the unit test project so it builds on linux in a debug build"). This translates into some starting assignment of boolean values to the boolean variables.

However, it is most likely, that this assignment will not satisfy every horn clause. For example: "if (debug build) then (include debug symbols = TRUE)" might be a clause, and so known that we have a debug build implies we need to set our boolean for including debug symbols to be true (we can break this out so we can then easily have things like optimized builds with debug symbols). Then once you know that you have debug symbols, we might have a clause like "if (debug symbols AND compiling with gcc AND writing a makefile) then (<magic side effect goes here>)"

Similar stuff happens for dependencies. If some code uses the maths library, then the place where the magic side effect says "add this file to the list of files being compiled" will also have a "link with maths library = TRUE" part to it, so then anyone who sets the boolean to get that file, will magically also be told that they need the maths library, and some other part of the system will be responsible for figuring out what that means (e.g. adding "-lm" to some makefiles)

The syntax for it is all simplified down so it doesn't really look like you are doing boolean arithmetic everywhere, but the semantics are mostly what I'm describing.

Now I understand, thanks so much! :)
Sure, you can avoid it. But I bet if it was turing complete you would not suffer for it either.

And the answer to 'how to build this' should be answerable by "run build_system". Any further details represents a failure of the build system (what you describe of having multiple build systems for each platform/IDE is especially hellish). Even a relatively small amount of code is likely to have details the matter beyond just 'compile and link'. Especially in embedded, where even if the codebase is all the code that's running on the system (no external libraries), it's far from trivial to actually get a working binary just by invoking the compiler.

Actually, Turing completeness would make it worse. This system means that when you pull in things from multiple places and multiple teams, the interactions between the different things would be really trivial to understand. e.g. if you want to know "why is this preprocessor macro being set?" then the tooling can tell you exactly why - even if the answer spans multiple modules from different teams. But the other side effect was that people tended to not do anything crazy or even just different (compare with the whole find_package business in cmake where different packages do things in different ways and frequently have conflicting compile/link requirements)

I am very familiar with embedded stuff, and this whole system was built precisely because of that. Some people are developing in visual studio so give them a good visual studio project that works well, some are working with a proprietary garbage toolchains, so you need to let them write makefiles and have enough flexibility to let them encode how to do that in the build system (e.g. all the different quirks you get for figuring out header file dependencies). The policy I used was "every toolchains needs some lowest common denominator support" and then "once you have that, let every toolchains be as good as possible with equivalences given abstractions". It did appear to work.

I think you may have misinterpreted me on the multiple build system thing. There is one build system and it produces files that each ide/toolchains can work with. So it can make xcode, visual studio, makefile, etc. No one edits those by hand, just like cmake, gyp, etc