Hacker News new | ask | show | jobs
by gavinhoward 1091 days ago
Disclaimer: I'm making a competing build system.

I won't tell you specific build systems, but I will tell you what to look for.

Look for power. Unlimited power. [1]

Usually, this means a few things:

1. The build system uses a general-purpose language, even if the language needs features to be added.

2. The build system does not reduce the power of the general-purpose language. For example, say it starts with Python but prohibits recursion. In that case, you know it is not unlimited power. Looking at you, Starlark.

3. The build can be dynamically changed, i.e., the build is not statically determined before it even begins.

4. Each task has unlimited power. This means that the task can use a general-purpose language, not just run external processes.

5. And there has to be some thought put it in user experience.

Why are these important? Well, let's look at why with CMake, which fails all of them.

For #1, CMake's language started as a limited language for enumerating lists. (Hence, CMakeLists.txt is the file name.) And yet, it's grown to be as general-purpose as possible. Why? Because when you need an if statement, nothing else will do, and when you need a loop, nothing else will do.

And that brings us to #2: if CMake's language started limited, are there still places where it's limited? I argue yes, and I point to the article where it says that your couldn't dynamically call functions until recently. There are probably other places.

For #3, CMake's whole model precludes it. CMake generates the build upfront then expects another build system to actually execute it. There is no changing the build without regenerating it. (And even then, CMake did a poor job until the addition of `--fresh`.) A fully dynamic build should be able to add targets and make others targets depend on those new targets dynamically, among other things.

For #4, obviously CMake limits what tasks can do because Ninja and Make limit tasks to running commands.

As another example, to implement a LaTeX target, you technically need a while loop to iterate until a fixed point. To do that with Make and Ninja, you have to jump through hoops or use an external script that may not work on all platforms.

CMake obviously fails #5, and to see how much other build systems fail it, just look for comments pouring hate on those build systems. CMake fails the most, but I haven't seen one that passes yet.

As an example, CMake barely got a debugger. Wow! Cool! It's been 20 years! My build system will have a debugger in public release #2 (one after the MVP) that will be capable of outputting to multiple TTY's like gdb-dashboard. [2] They should have had this years ago!

Should other comments suggest specific build systems, like the one that suggested Bazel, judge them by this list. Some will be better than others. None will pass everything, IMO, which is why I'm making my own.

[1]: https://youtube.com/watch?v=Sg14jNbBb-8

[2]: https://github.com/cyrus-and/gdb-dashboard

1 comments

Are you being serious, or is this an elaborate joke?

My experience with build systems is that, by going through the pain of trying to implement your own build system with some of the same desiderata as Bazel (reliability, performance), you end up independently discovering the same features Bazel has (like the purposefully limited scripting system). Going in the opposite direction seems like the best way to purposefully design a bad build system, but maybe I misunderstand what your goals are here?

Deadly serious.

My goal is to make a build system that the majority of people don't hate.

I understand your concerns about a fully-powerful system. I know that's why Bazel is a favorite.

But Nix has the same reliability with a full Turing-complete language, and I feel like I can recover the performance by using C instead of Java.

In addition, I understand the purpose of limited languages in build systems. I really do.

The first public release of my build system (the MVP) will have the ability to restrict the power of the language, and it will do so by default because limiting power when working with other people is noice. [1]

You may ask why I make a point of power if my own language will be hobbled by default. Because users will still be able to remove the hobble if needed. And sometimes, it will be needed.

But I'll still go above and beyond; not only will users be able to choose between a restricted language or a powerful one, they'll be able to choose how powerful the language needs to be.

Only need a POSIX Makefile replacement? That's the default. Need if statements, but only if statements? Got you covered. Need functions? No problem. Need loops? Just say the word. No dynamic build stuff needed? No worries needed. Wanna go full Palpatine? Yes, Master.

In other words, I hear you, and you are right for the vast majority of cases. But for that other minority of cases, power must exist, and there is no alternative, so it will be available.

This is related to Joel Spolsky's assertion that while everybody only uses 20% of the features, nobody uses the same 20% [2]:

> A lot of software developers are seduced by the old “80/20” rule. It seems to make a lot of sense: 80% of the people use 20% of the features. So you convince yourself that you only need to implement 20% of the features, and you can still sell 80% as many copies.

> Unfortunately, it’s never the same 20%. Everybody uses a different set of features. In the last 10 years I have probably heard of dozens of companies who, determined not to learn from each other, tried to release “lite” word processors that only implement 20% of the features. This story is as old as the PC. Most of the time, what happens is that they give their program to a journalist to review, and the journalist reviews it by writing their review using the new word processor, and then the journalist tries to find the “word count” feature which they need because most journalists have precise word count requirements, and it’s not there, because it’s in the “80% that nobody uses,” and the journalist ends up writing a story that attempts to claim simultaneously that lite programs are good, bloat is bad, and I can’t use this...thing ’cause it won’t count my words.

I hope that makes sense. And thank you for clarifying in your second paragraph.

[1]: https://youtube.com/watch?v=GGYpESpbHis

[2]: https://www.joelonsoftware.com/2001/03/23/strategy-letter-iv...

What advantages does this system provide? Could you give a motivating example?

Bazel’s BUILD.bazel scripts are restricted, but that part is the middle of a “sandwich” which handles the 90% use cases. If you want unfettered execution, you get that in the repository rule phase and the actual build actions (the two slices of bread in our sandwich). This allows you to define build rules dynamically (as long as you can do it before the main part of the build runs) and allows you to run arbitrary code in your build actions (as long as you specify a superset of the inputs and output, and your outputs are disjoint).

Good questions.

To be frank, Bazel is the build system that comes closest. The biggest selling point of mine against Bazel would be usability (I hope). So really, I would suggest people stay on Bazel if they already are.

(In fact, I would suggest that CMake users stay on CMake. I only want to capture new projects to start. If my build system then proves itself, people who need to will switch by themselves.)

About dynamic capabilities, that's not quite what I mean by defining build rules dynamically. Your qualification ("as long as you can do it before the main part of the build runs") throws out everything I meant. In addition, CMake can do that too.

However, it does sound like Bazel can otherwise run arbitrary code in build actions. Those restrictions you mention are fundamental to sandboxed build systems, so I don't include those.

Yes, my build system will be able to be hermetic and do sandboxing. It will do it in two ways, one of which will be like Bazel. The other will be a sandbox in the interpreter itself.

These will be per build rule and global. There will be no downloading stuff from the Internet if you don't allow it, and build rules will only be able to use outside commands that you allow. (For example, you could allow only the C compiler for a C project.) The sandbox could be even tighter and will be runtime-based: actions could be rejected at runtime based on runtime values.

So if there's a smaller selling point, I hope it would be better protection against malicious build scripts.

A motivating example: headers in C and C++.

What headers a file may use can depend on the build configuration (through preprocessor defines and such).

Yes, you can make a rule to run the preprocessor on the file, then make a separate rule to run the compiler on the preprocessed file. But there is no way to make the preprocessed file depend on the headers because its dependencies must be defined before the build and the list of included files is generated during the build.

"Okay, but why can't you just generate the list of files on an initial build and just use that in later builds?"

You can, and that is what DJB's redo does. But say that you change a file a header A included by a header B included by a file C. C knows it depends on B. It may know that it actually depends on header A too, but if it doesn't, it won't get rebuilt.

Motivating example 2: say you have a language with packages, like Python, except that it's compiled.

You have the main program that imports packages. It can dynamically generate targets for imported packages and dynamically depend on them. And do this recursively.

You are hard at work doing development. You already have done several builds. You change one file to import a new package you just created. Do you need to change the build file or do a clean build? Nope. The task recognizes the new dependency, suspends itself to build the new package, and then resumes. You are none the wiser.

Even better, your dependency information is contained in the actual source, not your build system. There is no duplication. And it "just works."

> But there is no way to make the preprocessed file depend on the headers because its dependencies must be defined before the build and the list of included files is generated during the build.

The actual dependencies of a build action in Bazel are defined at build-time, dynamically, after the action runs. So you do not need to know the exact dependencies ahead of time, you just need a superset—not something you can really avoid, as far as I can tell, because the way that #include will search multiple paths.

For languages like Go, where I can just import packages, the build file will be updated automatically to match the source files. This is done using a tool called Gazelle. I know this is possible in other languages as well, such as C++, I just don’t use those tools.

“Dependency information in the actual source” is what you get with Gazelle. There is some duplication in the build files, but I like this and find it useful—you can redirect dependencies to be fulfilled by other targets than what would be the default, for example.

> So you do not need to know the exact dependencies ahead of time, you just need a superset—not something you can really avoid, as far as I can tell, because the way that #include will search multiple paths.

You can avoid it with the system I laid out. You don't need to specify any dependencies, but specify them while the target is executing. You could even create the dependency while the target is executing.

I forgot about the multiple include paths, but this is another reason that dynamic dependencies may be useful.

That's not to say that Bazel is bad! It's just a different model. And dynamic dependencies may not be useful. That's perfectly fine! It's also why I would only suggest trying out such a build system on a new project. Don't break what's not broken!

> I know this is possible in other languages as well, such as C++, I just don’t use those tools.

Not really; you can't know what compilation unit will contain a function in C and C++. Well, you could try to hack it with grep or something, but I consider that less desirable.

> “Dependency information in the actual source” is what you get with Gazelle. There is some duplication in the build files, but I like this and find it useful—you can redirect dependencies to be fulfilled by other targets than what would be the default, for example.

This is really cool!

I need to clarify that you would still be able to redirect dependencies. Unlimited power is unlimited, after all.

But Gazelle sounds cool, and Bazel sounds cool. I don't mean to put them down.