| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by benmarten 2722 days ago
	It's not, why do I have to checkout terabyte of code that I don't need, even if the code is modularized?

7 comments

jchw 2722 days ago

No need to checkout a terabyte of code. If your repo is scaling that high, you're going to want a VFS layer. Microsoft made a VFS layer for Git. As you might imagine, you simply grab files as needed, and your version control just deals with diffs for the most part. Google's own monorepo is proprietary but the Bazel build system is open source and would work great with a VCS hooked up with a VFS layer.

jacques_chester 2722 days ago

I want to like Bazel. I really do. But on first encounter the syntax is filled with sigils that don't seem to have obvious differences or purpose for existence. Then it turns out that I and others have spent as much time fighting it as using it. Lastly the coverage of ecosystems is sparse and there does not seem to be a lot of activity around extending them -- doing the boring, tedious, unloved work of dealing with everyone's quirks and bugs and corner cases and annoyances (been there, done that).

Again: I wish it was a smooth experience. Because I like the ideas very much. But it wasn't when I tried and I don't know anyone -- outside of Google -- for whom it was a smooth experience.

iainmerrick 2722 days ago

I can’t speak to the actual implementation, but I’m surprised at your description of the syntax as “filled with sigils”, as the syntax is basically Python -- isn’t that about as easy as you can get?

I find Bazel’s syntax much easier to deal with than other build languages that use JSON (essentially the same Python syntax but with lots of extra quotes everywhere and extra fussiness about where commas are allowed).

jacques_chester 2722 days ago

    bazel build //main:hello-world

I'm sure the double slashes and colon have important differences. It is not obvious what they are.

    cc_binary(
      name = "hello-world",
      srcs = ["hello-world.cc"],
      deps = [
        ":hello-greet",
        "//lib:hello-time",
      ],
    )

It's not instantly obvious why one is :hello-greet and the other is //lib:hello-time.

I could swear I've seen @ floating around as well.

As I said above, I am sure these are all very sensible. But I am just tired of memorising minilanguages embedded in strings. I don't want to any more.

jkaplowitz 2722 days ago

Completely valid concern not to want to keep memorizing mini-languages.

In this case, the double slashes are absolute "paths" relative to the top of the workspace, and the part after the colon is a relative "path" to another Bazel target.

I put "paths" in quotes because these are meaningfully different from the true filesystem equivalents; avoiding confusion with real absolute and relative filesystem paths is probably why they made their own syntactic mini-language.

[The sibling reply to mine, referencing Piper and Perforce, goes into a bit more detail on the specifics and the origin of the // prefix.]

What would the better way have been for them to do this?

jacques_chester 2722 days ago

> What would the better way have been for them to do this?

I don't know, off the top of my head (having been on the other side of this conversation, I am aware how frustrating that answer is). But I know I couldn't keep it straight when I was fighting Bazel and that I gave up. And anecdotally I am not alone: I have seen Bazel torn out of multiple projects, sometimes quite painfully.

malkia 2722 days ago

Piper, google's source control system has roots in Perforce. In perforce, depot roots are starting with //

The ":" is a bit different, e.g. just "//lib" means "//lib:lib" - e.g. points to the "lib" target in /lib/BUILD file, while "//lib:hello-time" points to "hello-time" target in /lib/BUILD file. So not having the ":name" in "//dir:name" means name="dir" - e.g. "//dir:dir" - at first this is strange, but then you get used to it. Your default target is named after the folder it's sitting in.

jchw 2722 days ago

It is not a smooth experience outside of Google because the truth is bootstrapping a proper Bazel setup is not actually that easy. If you want hermetic builds for real, you need a hermetic build environment. Bazel tries to accomplish this with a workspace setup in each repo, but unfortunately it's definitely limited and imperfect.

The Bazel rules for languages is also not perfect imo. Like I dislike hooking Bazel up to tools like NPM and Webpack. I'd rather have a system that could sync NPM modules into third_party automatically and setup Bazel files for them, then have a bundling system that is native to Bazel that allows taking full advantage of it's caching and pure building.

Bazel is imperfect on Windows as well. I have tried to help but admittedly it is hard work and it'll take time. I wanted to get Bazel Watcher working on Windows, but my PR is stalled because the Windows API is very truly quite maddening at times. (Feel free to find the PR, it's almost hilarious how convoluted it is to effectively kill a tree of processes. Linux of course is imperfect here but it lets you get 95% of the way Much easier.)

However, here's what I will say: if you are in an organization, I think Bazel really shines. If you can take time to write some custom tools and rules and really integrate your software into Bazel, it can be an awesome experience. Sadly the publicly available rules try pretty hard to match existing semantics and fall short of showing off how nice Bazel can be in some cases, but I think C and C++ is a great area where Bazel shines above the pack.

Another plus: it is Amazing having a build system that crosses languages. Does your Python script depend on a C module and connect over TCP to a Go program? No problem, all of that is easy to express. Do you want to have a Go script that writes a TypeScript file that gets compiled and bundled into your apps JS bundle? Once again this is all fairly natural and you can easily accomplish it with a simple combination of normal build rules and a genrule.

And Starlark is a reasonably complete almost-subset of Python, so it's easy to compose, extend and refactor your rules. If you want to generate a matrix of targets for say, testing across browsers and platforms, you can do that, and make it reusable too.

Basically my advice with Bazel:

- Check out how well it works with C and C++, and I think Java also works quite well. This should give you an idea of how it looks when done right.

- Don't constrain yourself to what Bazel offers in terms of rules. Starlark is hugely powerful and you can easily make your own rules for things.

P.S.: the weird path syntax is probably many parts legacy, but it's not actually super hard to understand. When you see a colon, the left side of the colon is a path to a folder, and the right side is a target name. When you see double slashes, it means absolute path relative to root of workspace. If the colon is omitted the target name is assumed to be the same as the folder name.

//:base -> the base target in the BUILD file in the root of the workspace

//base -> //base:base -> the base target in the BUILD file in the base folder relative to the of the workspace

//app/ui:tests -> the tests target in the BUILD file in the app/ui folder relative to the workspace root

:genfile -> the genfile target in the BUILD file in the current directory

There is some context sensitivity about how to refer to files versus targets and whether you're referring to runfiles, output files, or build files, but most of the time it's surprisingly obvious actually. When it comes to files versus targets, it largely works a bit like Make except there's namespacing for input files vs output files (and runfiles, but that's another topic.)

There is also an @ syntax used to refer to paths outside the current workspace. It mainly comes into play when importing rules.

jacques_chester 2722 days ago

> However, here's what I will say: if you are in an organization, I think Bazel really shines. If you can take time to write some custom tools and rules and really integrate your software into Bazel, it can be an awesome experience. ... Another plus: it is Amazing having a build system that crosses languages.

This is pretty much what I think of when I want to like Bazel. I wish we had it on Cloud Foundry. Or, rather, I wish it had existed 5 years ago and had been used on Cloud Foundry from the beginning, because CF and its associated projects have hundreds of repositories and these have mostly been kept in sync through mountains of tests and oceans of automation. It works, but I know that in another universe it works better.

fxfan 2722 days ago

Would you attribute the c/cpp success to a lack of "native" build tool?

jchw 2722 days ago

I would say it is likely that the lack of a native C++ build tool helped Bazel to not have to compromise on how it integrates compilers into the system. I think that C++ is also just a good fit for the design; not all languages will. Interpreted languages fit into the system a bit less well in my opinion (but I still like that it is treated with some level of consistency.)

fxfan 2722 days ago

And what would you say about https://news.ycombinator.com/item?id=18821549?

Sorry not copying it here to avoid repost.

erulabs 2722 days ago

If a mono-repo has a terabyte of code, or if 10 small repos have 1/10th a terabyte each, what have you really gained? In any case, git LFS solves large file storage effectively, as do a number of other artifact storage solutions, and a repo with a terabyte of code is _not_ going to be trivially split apart, since it would be by a factor of thousands, the biggest codebase ever created by humankind.

twblalock 2722 days ago

If I only need to check out one of the smaller repos then I've gained quite a lot in terms of download speed, storage size, etc. Git LFS adds a lot of complexity I'd rather avoid.

erulabs 2722 days ago

Sure but then you only have some small portion of the total infrastructure, which adds its own layer of complexity for the people reviewing your changes :P It's all trade offs, is all I'm saying - I honestly still can't decide between the two, although for all companies sub 20 people, I'd for sure stick with a single repo.

tracker1 2722 days ago

If I'm working on Application X, wtf do I care about infrastructure code? Or for that matter, as a specific... if someone is working on Google Maps, should they care about the codebase for Google Inbox for Android?

malkia 2722 days ago

You maybe relying on shared component for your app, you simply put in your BUILD bazel (blaze) file deps reference to it - e.g. "//base:something", but now that "//base:something" might itself rely on other deps, but that should not be of your concern.

So - what's stopping you from depending (using) anything else? Or how to stop you from doing this? BAZEL (blaze) has visiblity rules, which by default are private - e.g. the rules in your packages are hidden, unless explicitly made public, or alternatively you can white-list one by one which other packages (//java/com/google/blah/myapp) can include you back.

Let's say there is a new cool service, and your team wants to try it out... but it's not out there for everyone to use, it's in alpha, beta, whatever stage. So you ask for permission from the team, or simply create a CL with your package target, name, "..." folder resolution so that you are whitelisted - eventually you will (if that's good idea, and approved). For example you want, if some library got deprecated, and has been slowly replaced with another, and then now instead of being "//visibility:public" is just white listing the last users of it... Well probably not good idea to be added on that list, as the whole thing is going out soon (yes, Google tends to deprecate internally even faster than externally - ... which is good!). But such mechanisms are helpful in getting this worked correctly.

matthewmacleod 2722 days ago

Does Application X rely on particular infrastructure configuration? Or does Google Inbox on Android integrate with Google Maps?

There are dependencies everywhere. Monorepos are one of the tools which can be used to make dealing with them easier in some cases. They’re not an absolute solution not appropriate for all circumstances, but no tool is!

nkozyra 2722 days ago

> If a mono-repo has a terabyte of code, or if 10 small repos have 1/10th a terabyte each, what have you really gained?

If it's a small company where every developer touches every part of the application, sure. Taking the FAANG approach if you're not part of that acronym sounds like introducing inefficiency.

klodolph 2722 days ago

If it's a "small" company then I'd expect that one Git repo would do just fine for all or at least most of the code. When I think small, I think ~10 or 20 developers. If you have reasonable hygiene about things like keeping binaries out of your Git repo (excluding consideration of e.g. LFS here) then the whole repo size will stay fairly reasonable. As long as you have one or two Git mavens on your team it should be dandy.

I'd expect to see problems with this approach once you get into the 100s or 1000s of developers. The tooling for this scale of repository isn't as mature.

nkozyra 2722 days ago

Sorry, what am I missing? That's exactly what I was saying - this stops making sense anywhere in between "small" and "the big boys"

klodolph 2722 days ago

> Taking the FAANG approach if you're not part of that acronym sounds like introducing inefficiency.

Is this not saying that small companies should avoid monorepos?

nkozyra 2722 days ago

Specifically excluded in the preceding sentence in my post.

skj 2722 days ago

Sounds like a tooling problem. We shouldn't use the current state of tooling as an excuse.

jessaustin 2722 days ago

Isn't the entire argument about the current (or maybe "immediately foreseeable") state of tooling? We don't really care one way or the other, in a philosophical sense. What works?

skj 2722 days ago

When the tools aren't good enough, we can either toss up our hands and say "I guess it's always going to be like this!", or we can get to work and make better tools.

jessaustin 2722 days ago

This is an argument about how to use current tools. TFA doesn't argue that mono will be great once we work really hard. It argues that mono is great now. Thread parent has a specific objection to that argument. You don't reasonably counter that objection with statements about morality.

skj 2722 days ago

A few things to note:

- I was replying to a comment, not the article.

- The article spoke about points that were largely independent of the current or future state of tooling. Instead, it focused on fundamental issues with mono- vs poly-repo systems. Most directly, being forced to fix migrations and incompatibilities immediately rather than letting versions skew.

If you want to batter someone for not arguing for or against the points in the article, you can do it with the comment I was replying to, or with your own comment just now.

rhacker 2722 days ago

It's not for everyone, but damn, why is there a TERABYTE of code? Just curious - assets? checking in binaries?

malkia 2722 days ago

Test protos. Evaluated configs. Golden data. JAR archives, etc.

alexnewman 2721 days ago

Signs your build system is never going to be adopted outside of people cargo culting you?

  - [x] Namespaces and the like without much security benefit
  - [x] Giant Java dependency
  - [x] Strange syntax and glyphs

mwkaufma 2722 days ago

We have a perforce monorepo with ~80gb total payload for the whole thing, but everyone uses streams to filter it, so that's not a problem.

hinkley 2722 days ago

I think there's a false dichotomy here.

In the post yesterday one of the arguments was that if nobody checks out all of the code then what's the value of having the code all in one place?

Last monorepo I worked on, individual contributors checked out just the tree they were working on (we had a suite of applications with several shared modules). We made it simple and straightforward for them to get what they wanted and ignore people whose work didn't impact them.

But the senior people, who were better with architecture and version control trivia, checked out the entire thing. They would steward any cross-cutting changes that needed to be done, and make sure any callers to shared libraries were updated in the face of breaking changes. They were also backstopped by the build plans, (some of) which also checked out the entire thing.

mwkaufma 2722 days ago

Streams aren't modules -- they're views. If someone takes you as a dependency and wants you to have visibility on them they add themselves to your stream so you pull down their directory as well.

slobotron 2722 days ago

Chances are you will end up downloading a lot of dependencies anyways, why not have git deliver it all?

nkozyra 2722 days ago

Huh? You'd download dependencies for the repos you need, not the code and dependencies for the entire company.

It could be several orders of magnitude larger and with a larger organization could be a lot of unnecessary code that any given Dev may never touch.

pvorb 2722 days ago

But imagine the increased productivity of your devs if they only had to check out a single repo. Anyone has the same organization of projects on their machine. All tools are in one place...

nkozyra 2722 days ago

I don't understand. Where is the argument for more productivity?

Too 2722 days ago

A: You avoid issues such as Readme files stating, "before compiling you have to git clone ../commonA, ../commonB". These always tend to get stale so in reality you also have to git clone ../commonC wasting you tons of hours of troubleshooting.

B: Developer working on daily basis in component A finds a bug in component B. He just has to change the code and commit it for review, instead of understanding the specifics of working with component B repository.

Tempest1981 2722 days ago

We have one large-ish repo that keeps showing "This repository currently has approximately 547 loose objects.”

We keep pruning and gc'ing with different flags, but pulls just seem far slower than other smaller repos.