Hacker News new | ask | show | jobs
by ddevault 2368 days ago
For anyone thinking about Bazel for their project/organization... run as fast as you can in the opposite direction. It's easily the most complex and unintuitive build systems in the world, and I'm saying that as someone who used SCons. At the last job where I used it, I was on a team of 5 whose responsibilities included Bazel upkeep, which required anywhere from 10 to 50% of our time. This was used by a broader engineering team of 50, working on 3-5 "big" projects and a few dozen small ones.
11 comments

If you are an organization with a large enough codebase (especially if it's in a monorepo) that you need a shared remote cache of build artifacts, or remote build sharding and execution, and have multiple languages (even protocol buffers) interacting in complex dependencies, then you should run as fast as you can away from less rigorous Blaze-alikes (Pants, Buck, etc.) straight towards Bazel.

Yes, it's complicated, but it's also quite rigorous, and the rigor pays off.

(We at Square had already found a Blaze-alike necessary. We are currently busy converting our Java build from Pants to Bazel.)

I'll never understand the fascination with mono repo's.
Once you reach a certain size of codebase, you're either going to be investing significantly in making many repositories work together and look a bit like a monorepo, or you're going to be investing significantly in making working on individual parts of a monorepo more efficient and look a bit like an isolated repo.

Both approaches take a huge amount of work and tooling.

The big selling point of a monorepo is that the time and effort taken to follow strict versioning and upgrade discipline for multiple interdependent projects can be somewhat avoided. On the code side.

If you're looking for a magic bullet argument proving that either approach is strictly better, I'm not the person to ask.
At a certain size, monorepo becomes the worst way to do it except for all the others.

Essentially: version skew across numerous artifacts in a large organization starts to look like the version skew across an industry or ecosystem. The aggregate cost of dealing with it project by project is probably higher, at least that is what most of the biggest tech companies have concluded, than dealing with it at the source level using a monorepo and single-version policy.

Well, don't have version skew then? Require that anything merged to master doesn't break any tests? Require that tests exist in the first place? Google makes it work at a dramatically larger scale. Everything at tip-of-tree is always ready to go.

EDIT: Looks like I've misread the parent's argument as one against monorepo. It was in fact an argument in favor, and one I agree with.

Yeah but Google does that by being a monorepo.
Looks like I've misread the parent's argument as one against monorepo. My error.
Well for one you can commit to multiple projects in a single PR. Makes coordinating changes across projects much easier.
It gives you that illusion; it doesn't solve versioning and deployment orders, and I'd argue that that's the harder part of changes across projects. Polyrepos make messy things...messy.
Deployment ordering at large scale is avoided and usually done by not making breaking changes. 4 phase migrations, always. Roll out new API, update existing software to use new API, wait for everything to stop using old API + backfill, remove old API.
I agree that gradual adoption of new APIs is the way to go, but once you're doing that you no longer need an atomic commit across all projects.
It pretty much does solve the versioning issue. “Latest, always”. The downside is the abysmal state of monorepo build tools. With multirepos, who updates the downstream repos’ dependency files (e.g., requirements.txt) when an upstream project releases a change? And is the policy “latest, always” or do you support N versions of every package? I would argue that the latter is insane at any scale, and the former leaves you dealing with dependencies manually (someone is updating the downstream repos’ dependency files when an upstream change is released) or you build automation that does it and you’re well on your way to implementing your own monorepo-like build tool.

Everything is hard, unfortunately.

Oddly, this is also one of the bad sides. Committing to two projects, by necessity, means deploying to two workflows. If not more.

Doing that in one repo makes the commit part easier, but hides the complexity of deploying separately. Or to other places.

Not that two repos makes it easy. Just gives a much earlier signal to where it happens.

Or you can have a single workflow that includes all the projects in the repo. I found it's actually easier to do things like wait for project A to deploy before project B.
Only if the safe deployment order is always the same. In any typical server-client deployment, breaking changes can go in either direction, and which one you can deploy first requires some thought. I've seen 3- or 4-stage deployments for some back-and-forth changes.

In my experience, you're required to break changes up into safe individual deployments anyways, so the monorepo doesn't add any benefit in that sense.

There are tradeoffs both ways. With multirepos you likely have a dependency hell problem and you often have to submit and release several PRs for otherwise small updates. With monorepos, (if you want reasonable build times) you have to be able to determine what has changed and what needs to build (including tests, etc) as a result. This is technically true of multirepos as well, but the problem is pushed into git and manual process.

Having looked seriously at both options, I think the monorepo world is the right one, but it presently lacks good tooling to sanely model your dependency graph AND create custom build rules while still being affordable for small or medium-sized orgs. Git/hub simply isn’t designed for this kind of modeling and everything I’ve seen built atop it is either way too manual or a kludge. Maybe the “kludge” solutions are actually reasonable, but my confidence is low.

Bazel is the right idea, but it’s execution disappoints. The documentation is abysmal, last I checked they advertised Python 3 support, but it’s been broken for years with no signs of progress. Building custom rules also looked hopelessly complex (by which I mean, “not something our organization can afford to implement and maintain”) but maybe there’s some undocumented happy path that I’m missing out on? These things seem easy enough to implement. We’re using Pants right now, and for it’s many similar problems (bugs, documentation, poor code base, difficult extensibility), it at least does a passable job at building Python projects.

I’ve thought about it a fair amount, and I think it’s reasonable to build something simpler that might not meet Google’s use case, but would at least enable small and medium sized shops to play the monorepo game.

rules_python has supported py3 for a while.

The next obvious question is, what would you do to make it simpler? Tons of people have tried (you listed 5), and they all rebuilt the same thing. What features do you drop?

Last I checked (maybe 6 months ago), it definitely _didn't_ support py3, although it was advertised. I thought I was doing something wrong, but there were half a dozen issues in the tracker that indicated it was critically broken.

I understand that "it should be simpler" is a pretty lazy criticism. It's been a while since I audited Bazel and friends, and I've forgotten which issues apply to which tool. Moreover, because of the awful state of the documentation and the messiness of the code base (or perhaps this is just standard quality for Java projects?), it's really difficult to tell whether any given issue is actually a fundamental shortcoming in the application or whether it's simply a knowledge gap.

As far as what I want, keep the starlark configuration file format; implement all rules as starlark libraries (such that no one needs to write Java to extend, and if you must write Java then for goodness' sake fix the plugin interface or document it better or something such that one doesn't need to be a core contributor to implement a plugin--perhaps this is fine for an enterprise audience, but it's not fine for my use case). The rules should call into a base `mktarget()` or similar that takes args like the target's ID (the package:target_name pair), a target type that identifies the code used to build the target, and a dict of args/params that are passed into the aforementioned code. The args/params can be an arbitrarily nested JSON-like type so long as the leaves are primitives (int, string, etc), references to source files, or other targets and all leaves (and transitively, the whole structure) must be hashable such that we can identify a given execution of the build.

Beyond that core operating model, the code and the user interface should be clean and well documented. Ideally, small and medium-sized projects shouldn't need to run it in daemon mode to get reasonable performance. This is important because a daemon running on local development machines introduces a larger maintenance burden (there's just more that can go wrong). Language-specific plugins (custom rules, whatever you want to call them) should adhere pretty closely to the conventions of the target language. Lastly, there should be good support for building toplevel artifacts--this means I should be able to build a whole CloudFormation package including lambdas, Docker images, etc just like I would build a JAR or a C++ binary.

I realize that those things are easy enough to say, but the devil is in the details. I've actually gone so far as to prototype the implementation, so I'm confident that those goals are achievable. Unfortunately, it's a pretty significant effort (mostly due to the breadth of project types/languages to support and the nuance/expertise required to support any of them), so I'm bound by free time. If anyone is interested in collaborating or discussing more in-depth, hit me up on Twitter @weberc2 or email me (username at gmail.com).

As of April of this year, python3 was the default for python rules in bazel.
cross building is even worse with many repos. I've been there, done that and it broke so often. now we have everything in one repo and we barly have problems. btw. we are a small shop with less than 5 people, but have a product on metal that requires multiple services (that sometimes interact with each other)

we don't use bazel (yet), because dotnet is not that supported.

I am sure you will when you will end up working in a huge organization with intricated and heterogeneous projects/teams interdependencies.

You will soon experience:

- dependencies hell due to transitive and conflicting dependencies

- one back-incompatible change in some obscure library end up breaking some other unknown service that happens to transitively depend and it

- the entire codebase will become a mess due to inconsistent code styles and formatting because hey we are developers and we can never agree on anything. Thus each team lead will have its own opinion

- each team will have to maintain its own CI/CD jobs

- heterogeneous builds: maven, node, sbt, webpack, etc ...

the list goes on ...

All (or most of) this mess is solved by centralizing the codebase in a monorepo.

Yeah, I imagine the test is “do you need those things badly enough to dedicate a large portion of a team’s capacity”?
Nix also exists...
Nix is not (yet) suitable for fine-grained (read file level) build targets though due to lack of recursive nix and content addressed store. This means you don't have early cut off and mass rebuilds if only one file changes; for example. Both are being worked on actively though.
If I understand it correctly (unlikely), Nix has the degree of purely-functional rigor necessary to do this correctly, right? Sounds like it would eventually be awesome for Bazel usecases.
Nix isn't great as a build system, because it throws a lot out and rebuilds everything when something changes. It's intended to get a correct, isolated package installed, not to maximize sharing.

Bazel goes through great pains to only rebuild the minimum necessary for correctness. It's able to do that because bazel build files get a lot more information about the source level dependencies than a Nix file does

Reading through https://bazel.build/designs/skyframe.html, this sounds pretty much like what would be possible with the aforementioned recursive Nix and content-addressed paths. Bazel might still win in practice since the overhead of a Nix build is pretty high, which gets even more important when you do them recursively for each file.
I've only experienced issues when using bazel with third party package management systems. If you can own all of your source it, and it's descendants, are easily my favorite build systems. It's features complement modern software development in a very ergonomic way: uniform build language, API for learning about your source, testing your entire code base in every language with one command, hermetic and reproducible builds, distributed builds, and caching I can actually trust.

Using Bazel with external packages on the other hand is one of the most tedious and frustrating endeavors imaginable. If you can vendor all of your source it's much less frustrating. This is extremely manageable in the C and C++ worlds where there aren't really any package mangers and you end up needing to do that anyway.

I would advice checking out https://www.tweag.io/posts/2018-03-15-bazel-nix.html for external dependencies
Pretty much the same experience. 2y old monorepo company with >150 engineers now, being slowed down by hacks upon hacks in the usage of the bazel build system with around 5 people at the company understanding how bazel works. And those people are discouraged quickly when trying to improve things because of the shear brittleness of the existing build files and "optionated" approaches in the community that don't quite cut our usecases. Even things like enabling remote caching and execution take many months (and even external companies) and still drag on.

Edit: I don't want to come off as too negative about Bazel. But I really think it needs more time and is nowhere near something that I would call a 1.0 let alone a 2.0

Indeed. If Bazel had a slogan, it would be "Everyone who doesn't do things my way is stupid". That's probably great within Google, but trying to integrate it into a company that has already made conflicting decisions (for good reasons) is hell.
I feel like that slogan could apply to most OSS projects Google dumps on the world.
To each their own I guess.

I’ve been using it for the last 2 years, and I am not doing any project again without it.

Bazel is not that complex if you start a project with it. Migrating to it and learning it at the same time will be hard though since you’re likely to uncover a lot of skeletons.

This comment is not internally consistent.
How so?
> Bazel is not that complex if you start a project with it.

Bazel is quite complex and when you start a project you do not yet need it, rarely will an organization start with something like Bazel, they use it because - you hope - they need it.

vs:

> Migrating to it and learning it at the same time will be hard though since you’re likely to uncover a lot of skeletons.

So the bulk of Bazel use cases will revolve around migrating an existing build system to use Bazel instead, and that is hard, because Bazel is difficult and has a very steep learning curve, and requires a lot of work to keep it running.

Tooling should adapt to use cases, if you need to adapt your use cases to the tooling then that's a fault of the tool. If that limits use of the tool to those projects that are started with it then you have already lost the vast majority of your potential audience. So yes, if you start using Bazel right from day #1 then that might be the way to go. But I suspect - and so far have not seen any evidence - that that is the way it is actually used.

It is true that you rarely start with a new build system.

Bazel is hard in the same way Rust is hard. If you port your existing project to it, chances are you will run into issues because you were doing things wrong with respect to hermeticity or reproducibility. It goes really far to make things correct. You may not need it, but when you do it’s a godsend. Or at least it was for a lot of people I talked to. And my own experience as well.

If your project is vanilla enough, things will go mostly smoothly and the benefit will be immediate (ie bazel clean is a legend).

Think of Bazel as a framework. If you do thing its way, it will spoil up. But sometimes a framework is not what you need. That said, if you’re happy with your current system, then good for you!

I just don't see a new company starting up that one day will need Bazel doing that with Bazel. There is just too much overhead, you won't ever become that company that needs Bazel if you start out that way.
Seconded. Any org with less than a few 100 engineers (and many with that many and more) would do better to stay away from this. I've had the dubious honor of using it for one project and to me the slogan is 'tools should work, not require attention', rather than the opposite. Bazel will require a lot of your attention and for smaller companies that could easily be a big percentage of their available capacity.

For very large organizations with the capability of assigning one or more teams to tooling it may very well be the right choice.

Could you elaborate? I've been using it for a decade, for all my projects big and small, and it's been a _massive_ time saver compared to any of the alternatives. It's fast, well documented, and it doesn't rebuild/retest stuff when it doesn't have to. I've also done multiplatform builds with it, as well as cross-builds to ARM. Not once did I have the need for any "upkeep". At Google there was a team to do that, of course, but even outside Google, even very early on when Bazel was just released, the maintenance is pretty minimal, and the build files are by far the most readable of any build system I have used so far in 20+ years in this industry.
> run as fast as you can in the opposite direction

But which one? Are there any other (non blaze-like) build systems enabling hermetic (possibly remote) builds and caching?

If you don't need these properties and have a mono language project, the language's native build system sure fits and may be a better choice.

There is build2. It has "high-fidelity" (instead of hermetic) builds meaning that besides sources it keeps track of changes to options, compilers, etc. This gives you similar benefits at a fraction of the cost. There is no distributed compilation or caching yet but it's coming. In other benefits, it doesn't need Java or Python (or any other "platform").
Assuming you didn't actually care about hermetic builds, the challenge with build2 is that it's c++ only AFAIK. Larger orgs turn into polyglot scenarios (python/bash for scripting/gluing things together, Go for web services, C/C++ for high performance code, now Rust, etc).

You can of course try to use the native solution on each but that makes it more difficult for people to jump between projects/languages as the syntax for describing the build changes. Moreover for centralized build infra this becomes more difficult to orchestrate/co-ordinate because now you have to add remote caching/parallel compilation & whatnot to multiple places (with all the associated challenges of trying to upstream the same set of logical changes into many different projects with their own maintenance schedules/philosophies).

One of the main benefits of Bazel (and similar systems) is that you get a build cache that you can mostly trust. When you have a project that takes an hour or more to build and test, and lots of machines to run a distributed build, it really makes a difference.

If you have a small project that you can rebuild in a couple of minutes, Bazel is probably an overkill.

Hey Drew, I'm a huge fan and have a lot of respect for your work.

I feel like I see the pattern of people on HN being disappointed in some of the tools that come out of Google and other large engineering orgs, when they don't work out really well in orgs that are not operating at the same scale. People have similar complaints about the complexity of other projects that come out of Google. K8s comes to mind as one such example. Often times these tools must be robust to such a large variety of uses that they are simply overkill for smaller organizations. I'll readily admit that I could be wrong and Bazel is simply poorly designed, but it is perhaps worth considering that the build system used by an engineering team of 50 need not be as complex as the build system used by one of the largest engineering orgs in the world. My guess is we'd see a lot less backlash if people tried to step off the hype train for a moment and critically evaluate whether they really need to use something like Bazel or K8s when something simpler would suffice.

Bazel advertises itself as

> Build and test software of any size, quickly and reliably

Given the comments here, a tagline focusing on its strengths for large orgs/projects probably would be better marketing.

They need to do a better job of making the assumptions behind the design of the tools clearer then. Because, from what I can tell, many people get the idea that the path to success is to do what Google does (even knowing about the meme that people just try and copy Google). This doesn't just apply to their software tools, but also to their corporate processes (OKRs, etc).
> the path to success is to do what Google does

This kind of cargo-cult process copying has infested the start-up world and is akin to sending a lot of shirts to the laundromat because that's what rich people do and you want to be rich too.

These things work for large companies because they are large companies. Their problems and associated solutions rarely if ever are a good match for the kind of issues that your average start-up contends with, especially early on in the life cycle.

You and your buddy the first-hire developers are not going to gain anything by copying the Spotify development model, and other examples in that vein.

OKRs came from Intel, FWIW. Google got them via John Doerr.
I was talking with an engineer who saw two people burn out over Bazel, though the specific gripe was with Scala support. I'd expect first-class languages at Google (C++, Java, Python, Go) to get better support.
The languages themselves have decent support. The problem is that it works great if you code the way Google does internally with all your dependencies vendored. Outside of the googleplex where we have, you know, package managers, bazel adds a ton of complexity and bugginess. The core algorithms are battle-hardened in Google, and the third-party package manager support is a tacked-on afterthought on to the open sourced version.

I don't blame the bazel authors, but the development process it was designed for is not the development process of 99% of companies out there. Maintaining BUILD files for all of your vendored dependencies is expensive for your company. You need a full time team working on it.

Unless hermeticity and build correctness issues are absolutely killing your team's productivity (and at a certain size, they might be!) think twice before adding bazel to your maintenance overheads. You can always move to it later if you need it, and it might be more stable and have a better 3rd party package story by then

You were an order of magnitude too small to need Bazel.
I think it entirely depends on how you use tools. Where I currently work people decided that it would be a great idea to extend waf (another niche build system) with all kinds of features. If you ever worked with waf you can probably imagine how bad that can turn out. As long as you stick to bazel’s default features and don’t start to extend it, I’ve found it really pleasant to use especially for C++ and python projects (where you want to expose C++ libraries to python as well). If you start to extend it it can probably become horrible really quickly. The only experience I have with that are abortive attempts at integrating system verilog tools (turned out to be hard) and integrating a custom GCC toolchain (worked fairly well)