Hacker News new | ask | show | jobs
by gavinhoward 1091 days ago
Good questions.

To be frank, Bazel is the build system that comes closest. The biggest selling point of mine against Bazel would be usability (I hope). So really, I would suggest people stay on Bazel if they already are.

(In fact, I would suggest that CMake users stay on CMake. I only want to capture new projects to start. If my build system then proves itself, people who need to will switch by themselves.)

About dynamic capabilities, that's not quite what I mean by defining build rules dynamically. Your qualification ("as long as you can do it before the main part of the build runs") throws out everything I meant. In addition, CMake can do that too.

However, it does sound like Bazel can otherwise run arbitrary code in build actions. Those restrictions you mention are fundamental to sandboxed build systems, so I don't include those.

Yes, my build system will be able to be hermetic and do sandboxing. It will do it in two ways, one of which will be like Bazel. The other will be a sandbox in the interpreter itself.

These will be per build rule and global. There will be no downloading stuff from the Internet if you don't allow it, and build rules will only be able to use outside commands that you allow. (For example, you could allow only the C compiler for a C project.) The sandbox could be even tighter and will be runtime-based: actions could be rejected at runtime based on runtime values.

So if there's a smaller selling point, I hope it would be better protection against malicious build scripts.

A motivating example: headers in C and C++.

What headers a file may use can depend on the build configuration (through preprocessor defines and such).

Yes, you can make a rule to run the preprocessor on the file, then make a separate rule to run the compiler on the preprocessed file. But there is no way to make the preprocessed file depend on the headers because its dependencies must be defined before the build and the list of included files is generated during the build.

"Okay, but why can't you just generate the list of files on an initial build and just use that in later builds?"

You can, and that is what DJB's redo does. But say that you change a file a header A included by a header B included by a file C. C knows it depends on B. It may know that it actually depends on header A too, but if it doesn't, it won't get rebuilt.

Motivating example 2: say you have a language with packages, like Python, except that it's compiled.

You have the main program that imports packages. It can dynamically generate targets for imported packages and dynamically depend on them. And do this recursively.

You are hard at work doing development. You already have done several builds. You change one file to import a new package you just created. Do you need to change the build file or do a clean build? Nope. The task recognizes the new dependency, suspends itself to build the new package, and then resumes. You are none the wiser.

Even better, your dependency information is contained in the actual source, not your build system. There is no duplication. And it "just works."

1 comments

> But there is no way to make the preprocessed file depend on the headers because its dependencies must be defined before the build and the list of included files is generated during the build.

The actual dependencies of a build action in Bazel are defined at build-time, dynamically, after the action runs. So you do not need to know the exact dependencies ahead of time, you just need a superset—not something you can really avoid, as far as I can tell, because the way that #include will search multiple paths.

For languages like Go, where I can just import packages, the build file will be updated automatically to match the source files. This is done using a tool called Gazelle. I know this is possible in other languages as well, such as C++, I just don’t use those tools.

“Dependency information in the actual source” is what you get with Gazelle. There is some duplication in the build files, but I like this and find it useful—you can redirect dependencies to be fulfilled by other targets than what would be the default, for example.

> So you do not need to know the exact dependencies ahead of time, you just need a superset—not something you can really avoid, as far as I can tell, because the way that #include will search multiple paths.

You can avoid it with the system I laid out. You don't need to specify any dependencies, but specify them while the target is executing. You could even create the dependency while the target is executing.

I forgot about the multiple include paths, but this is another reason that dynamic dependencies may be useful.

That's not to say that Bazel is bad! It's just a different model. And dynamic dependencies may not be useful. That's perfectly fine! It's also why I would only suggest trying out such a build system on a new project. Don't break what's not broken!

> I know this is possible in other languages as well, such as C++, I just don’t use those tools.

Not really; you can't know what compilation unit will contain a function in C and C++. Well, you could try to hack it with grep or something, but I consider that less desirable.

> “Dependency information in the actual source” is what you get with Gazelle. There is some duplication in the build files, but I like this and find it useful—you can redirect dependencies to be fulfilled by other targets than what would be the default, for example.

This is really cool!

I need to clarify that you would still be able to redirect dependencies. Unlimited power is unlimited, after all.

But Gazelle sounds cool, and Bazel sounds cool. I don't mean to put them down.

I’m not sure what you mean by specifying dependencies while the target is executing. The thorn here is generated headers—if they aren’t built before your compiler runs, then you are stuck.

> Not really; you can't know what compilation unit will contain a function in C and C++. Well, you could try to hack it with grep or something, but I consider that less desirable.

This is solvable and has in fact been solved. You use a function in your C++ file, the analysis system knows which header contains that function declaration, and the build system knows which library must be linked in for the header file. This is basically how it works in Go, with some extra steps to associate headers with libraries. But all the pieces are there—you do not need grep, if you want to implement a similar system yourself.

The catch here is that these systems are a bit inexact—any given function could be supplied by more than one library, and the header files may require some specific ordering to work correctly. The solution is to store the dependencies in the build scripts, rather than try and figure them out from sources each time. The general problem, of figuring out the correct headers and libraries necessary to compile a given piece of C++ code, is just too much of a pain in the ass to make it completely automatic—you want a human in the loop. It’s not just a problem with exactness, you also have multiple configurations with their own preprocessor flags, you have dependencies which are specified indirectly but which should be direct (how do you detect that?)

The ecosystem, such as it is, is a chaotic mixture of tools used interactively during development or non-interactively during the build. One of the super useful properties of Bazel build files is that you can modify them programmatically, using a tool called Buildozer. This can be used for things like automatic refactoring of your build system, and it can also be used to make automatic changes to the build system as you edit source code. Part of the “sauce” that makes it work is the way rules are rigidly defined in build scripts. As you make build scripts more complicated, it gets harder and harder for the tooling to keep up—and often, that means more manual work to keep everything set up right.

> I’m not sure what you mean by specifying dependencies while the target is executing. The thorn here is generated headers—if they aren’t built before your compiler runs, then you are stuck.

I understand why you think this. It took me a while to understand dynamic dependencies too.

But that is actually not the case. A target that may have dynamic dependencies will run and figure out the required headers or figure out the required imports. At that point, it tells the build system that it needs those dependencies, but if those dependencies are already up-to-date, it can be considered up-to-date too.

The build system checks those dependencies, finds that they are up-to-date and marks the first target done and doesn't finish running it.

> The general problem, of figuring out the correct headers and libraries necessary to compile a given piece of C++ code, is just too much of a pain...to make it completely automatic—you want a human in the loop.

This is what I meant. Sure, you can make good assumptions (and in my monorepo, functions are arranged in specific ways in files, so I could do that), but it's not generalizable.

> One of the super useful properties of Bazel build files is that you can modify them programmatically, using a tool called Buildozer. This can be used for things like automatic refactoring of your build system, and it can also be used to make automatic changes to the build system as you edit source code.

This is really cool. It shows that Bazel is just a different model, and in the eyes of many people, better than mine. That's okay! I'm sure that plenty of people would prefer Bazel's model, including yourself. That's great! Diversity of build systems is good.