Hacker News new | ask | show | jobs
by jchw 2728 days ago
No need to checkout a terabyte of code. If your repo is scaling that high, you're going to want a VFS layer. Microsoft made a VFS layer for Git. As you might imagine, you simply grab files as needed, and your version control just deals with diffs for the most part. Google's own monorepo is proprietary but the Bazel build system is open source and would work great with a VCS hooked up with a VFS layer.
1 comments

I want to like Bazel. I really do. But on first encounter the syntax is filled with sigils that don't seem to have obvious differences or purpose for existence. Then it turns out that I and others have spent as much time fighting it as using it. Lastly the coverage of ecosystems is sparse and there does not seem to be a lot of activity around extending them -- doing the boring, tedious, unloved work of dealing with everyone's quirks and bugs and corner cases and annoyances (been there, done that).

Again: I wish it was a smooth experience. Because I like the ideas very much. But it wasn't when I tried and I don't know anyone -- outside of Google -- for whom it was a smooth experience.

I can’t speak to the actual implementation, but I’m surprised at your description of the syntax as “filled with sigils”, as the syntax is basically Python -- isn’t that about as easy as you can get?

I find Bazel’s syntax much easier to deal with than other build languages that use JSON (essentially the same Python syntax but with lots of extra quotes everywhere and extra fussiness about where commas are allowed).

    bazel build //main:hello-world
I'm sure the double slashes and colon have important differences. It is not obvious what they are.

    cc_binary(
      name = "hello-world",
      srcs = ["hello-world.cc"],
      deps = [
        ":hello-greet",
        "//lib:hello-time",
      ],
    )
It's not instantly obvious why one is :hello-greet and the other is //lib:hello-time.

I could swear I've seen @ floating around as well.

As I said above, I am sure these are all very sensible. But I am just tired of memorising minilanguages embedded in strings. I don't want to any more.

Completely valid concern not to want to keep memorizing mini-languages.

In this case, the double slashes are absolute "paths" relative to the top of the workspace, and the part after the colon is a relative "path" to another Bazel target.

I put "paths" in quotes because these are meaningfully different from the true filesystem equivalents; avoiding confusion with real absolute and relative filesystem paths is probably why they made their own syntactic mini-language.

[The sibling reply to mine, referencing Piper and Perforce, goes into a bit more detail on the specifics and the origin of the // prefix.]

What would the better way have been for them to do this?

> What would the better way have been for them to do this?

I don't know, off the top of my head (having been on the other side of this conversation, I am aware how frustrating that answer is). But I know I couldn't keep it straight when I was fighting Bazel and that I gave up. And anecdotally I am not alone: I have seen Bazel torn out of multiple projects, sometimes quite painfully.

Bazel is definitely designed for a very different model than how most of the world works. (I.e., Google's internal model.)

This clearly shows in Bazel's Python support: its internal version (Blaze) gets used quite often with Python inside Google's monorepo, and it works very nicely in that role, but that's a very different way of using Python than approximately the entire rest of the world. It's still Python, to be clear, just everything else is pretty different. ;)

Still, Bazel's model is pretty great if you adjust your brain, tooling, and patterns to it. I accept that most people don't. And some of its preferred usage patterns are more trouble than they're worth in a typical small shop anyway, at least with the usual other tooling one has to integrate with.

Tradeoffs...

Piper, google's source control system has roots in Perforce. In perforce, depot roots are starting with //

The ":" is a bit different, e.g. just "//lib" means "//lib:lib" - e.g. points to the "lib" target in /lib/BUILD file, while "//lib:hello-time" points to "hello-time" target in /lib/BUILD file. So not having the ":name" in "//dir:name" means name="dir" - e.g. "//dir:dir" - at first this is strange, but then you get used to it. Your default target is named after the folder it's sitting in.

It is not a smooth experience outside of Google because the truth is bootstrapping a proper Bazel setup is not actually that easy. If you want hermetic builds for real, you need a hermetic build environment. Bazel tries to accomplish this with a workspace setup in each repo, but unfortunately it's definitely limited and imperfect.

The Bazel rules for languages is also not perfect imo. Like I dislike hooking Bazel up to tools like NPM and Webpack. I'd rather have a system that could sync NPM modules into third_party automatically and setup Bazel files for them, then have a bundling system that is native to Bazel that allows taking full advantage of it's caching and pure building.

Bazel is imperfect on Windows as well. I have tried to help but admittedly it is hard work and it'll take time. I wanted to get Bazel Watcher working on Windows, but my PR is stalled because the Windows API is very truly quite maddening at times. (Feel free to find the PR, it's almost hilarious how convoluted it is to effectively kill a tree of processes. Linux of course is imperfect here but it lets you get 95% of the way Much easier.)

However, here's what I will say: if you are in an organization, I think Bazel really shines. If you can take time to write some custom tools and rules and really integrate your software into Bazel, it can be an awesome experience. Sadly the publicly available rules try pretty hard to match existing semantics and fall short of showing off how nice Bazel can be in some cases, but I think C and C++ is a great area where Bazel shines above the pack.

Another plus: it is Amazing having a build system that crosses languages. Does your Python script depend on a C module and connect over TCP to a Go program? No problem, all of that is easy to express. Do you want to have a Go script that writes a TypeScript file that gets compiled and bundled into your apps JS bundle? Once again this is all fairly natural and you can easily accomplish it with a simple combination of normal build rules and a genrule.

And Starlark is a reasonably complete almost-subset of Python, so it's easy to compose, extend and refactor your rules. If you want to generate a matrix of targets for say, testing across browsers and platforms, you can do that, and make it reusable too.

Basically my advice with Bazel:

- Check out how well it works with C and C++, and I think Java also works quite well. This should give you an idea of how it looks when done right.

- Don't constrain yourself to what Bazel offers in terms of rules. Starlark is hugely powerful and you can easily make your own rules for things.

P.S.: the weird path syntax is probably many parts legacy, but it's not actually super hard to understand. When you see a colon, the left side of the colon is a path to a folder, and the right side is a target name. When you see double slashes, it means absolute path relative to root of workspace. If the colon is omitted the target name is assumed to be the same as the folder name.

//:base -> the base target in the BUILD file in the root of the workspace

//base -> //base:base -> the base target in the BUILD file in the base folder relative to the of the workspace

//app/ui:tests -> the tests target in the BUILD file in the app/ui folder relative to the workspace root

:genfile -> the genfile target in the BUILD file in the current directory

There is some context sensitivity about how to refer to files versus targets and whether you're referring to runfiles, output files, or build files, but most of the time it's surprisingly obvious actually. When it comes to files versus targets, it largely works a bit like Make except there's namespacing for input files vs output files (and runfiles, but that's another topic.)

There is also an @ syntax used to refer to paths outside the current workspace. It mainly comes into play when importing rules.

> However, here's what I will say: if you are in an organization, I think Bazel really shines. If you can take time to write some custom tools and rules and really integrate your software into Bazel, it can be an awesome experience. ... Another plus: it is Amazing having a build system that crosses languages.

This is pretty much what I think of when I want to like Bazel. I wish we had it on Cloud Foundry. Or, rather, I wish it had existed 5 years ago and had been used on Cloud Foundry from the beginning, because CF and its associated projects have hundreds of repositories and these have mostly been kept in sync through mountains of tests and oceans of automation. It works, but I know that in another universe it works better.

Would you attribute the c/cpp success to a lack of "native" build tool?
I would say it is likely that the lack of a native C++ build tool helped Bazel to not have to compromise on how it integrates compilers into the system. I think that C++ is also just a good fit for the design; not all languages will. Interpreted languages fit into the system a bit less well in my opinion (but I still like that it is treated with some level of consistency.)
And what would you say about https://news.ycombinator.com/item?id=18821549?

Sorry not copying it here to avoid repost.