Hacker News new | ask | show | jobs
by weberc2 2438 days ago
Can you elaborate on “monorepos do not prevent you from checking packages into source control” and how that helps to avoid recompiling everything? Why would you check a package into source control anyway? Surely source control is for source code? And I lean toward monorepos, btw, but there are still lots of obstacles and monorepo proponents don’t tend to acknowledge them or offer clear suggestions for how to solve or workaround them.
3 comments

You can use something like a shared binary repo such as maven or you could just check in dependencies and not worry about an external server being available for builds.

>Surely source control is for source code?

This is just pedantry. Checking in binaries is a pragmatic solution that solves a lot of problems.

I was rather under the impression that checking in binaries was discouraged because it led to performance issues and tends to blow up the repository size. I don't think it's just pedantry.
I wasn’t trying to be a pedant, I’ve just never heard of anyone doing this. I was wondering how it helped solve the problem of not rebuilding everything.
In short, the binaries are already built. Usually its faster to link to a prebuilt binary than to build from scratch.
So where do these binaries get built and how does the system know which binaries to rebuild for a given change? If developers are building binaries and committing them directly, doesn’t that open up security or even correctness issues? How does this approach satisfy compliance concerns (how can the CTO or a manager sign off on the changes that went into the binary if it’s just something a random developer committed?)? How does this scale to tens of deployments per day? These are hard monorepo problems, and they keep being handwaved away.
Suppose the binaries in question are build tools or similar: then this is good, because they never get rebuilt. The paperwork is done, the binaries get committed to version control, and everybody that builds the code then builds the code with the approved binaries. Everybody is happy.

Suppose the binaries are build byproducts, and people just check this stuff in, like, whatever. Well, if somebody needs to sign off on the output, that's a problem - so that person then doesn't use what's in the repo, but instead builds the output from scratch, from the source code, hopefully with known build tools (see above!), and signs off on whatever comes out.

But, day to day, for your average build, which is going to be run on your own PC and nowhere else, nobody need sign off on anything. If you link with some random object file that was built on a colleague's machine, say, then that's probably absolutely fine - and even if it isn't, it's still probably fine enough to be getting on with for now. If you work for the sort of company that's worried about this stuff, there's a QA department, so any issues arising are not going to get very far.

Overall, this stuff sorts itself out over time. Things that are problems end up having procedures introduced to ensure that they stop happening. And things that are non-problems just... continue to happen.

>So where do these binaries get built and how does the system know which binaries to rebuild for a given change?

For simple things, if the code in a directory changes then the CI system does a rebuild of that directory. You can have the CI system either validate that the binary matches or commit the binary itself. More complicated things you'll have a build system such as Bazel which figures out what changed.

(Sorry for being terse—on mobile). Validate the binary matches what? If the compiler has to compile the artifact to verify the artifact provided by the developer, why bother having the developer commit the artifact? The CI system could just do it. Never mind that having a bit-for-bit reproducible build is incredibly difficult. Anyway, such simple cases where a whole app lives under a single directory are vanishingly rare.
Its really not any different than depending on the exact version in some dependency manager. Instead of just the dependency config you check in the binary. When a dev needs a newer version of a dependency they can pull it down and check it in. You wouldn't check in random nameless binaries, just hard copies of things you would have linked to from a dependency repository.

This doesn't work well for dependencies where you're expected to be using the latest version of something that changes 10 times a day.

The rest of your questions are fairly irrelevant as they would be answered the same way as the in the dependency repo case. ie, use official binaries.

...but this is closer to multi-repo than monorepo. If you're in a monorepo you might as well use the source.

> So where do these binaries get built and how does the system know which binaries to rebuild for a given change?

By the CI. All major CI/CD tools support rules like build binary x whenever a file under x-src/* changes; commit binary x when the ref matches /v[0-9.]+/; don't allow developers to manually push to these refs / paths; (run a script to) bump the dependent x of y whenever binary x changes; merge the bumped version if all tests still pass; etc.

The problem is dependency graphs aren’t strictly hierarchical, so it doesn’t suffice to say “rebuild whenever something under this directory changes”.
Not sure how people do this in practice. But in principle it seems rather straight forward.

A compiler is just a program that takes some input and create some output. Both the compiler and the input can have a cryptographically secure hash. Putting both in a sealed box, like a docker image, with its own hash, gives you a program that takes no input and produces some output.

If the box changes, run it in a trusted machine and save the output together with a signed declaration of which box version produced it

Docker makes this drastically easier (need the exact same versions of all libraries and the compiler), but there are still compile time things that are unique per-compile. Debian has been working hard to get hashes of binaries to be useful but the work is far from trivial.

(See also: trusting trust)

At Google we check in the source of every library into the monorepo and compile them ourselves with cached builds from a central server, I don't think we use package managers.
You don't have to use a package manager, that's just the approach the TiVo folks came up with a couple decades ago. They use RPM to package independent software modules and check them into (IIRC) a separate build repository which saves the last n months of work. A local config file is used to choose the binary package version to use, or, alternatively, the locally built files to use. They probably could have just made tarballs, since I don't think they used any of the dependency checking.
How do you track dependencies of dependencies. Do you need to manually add the full dependency tree and re implement the dependency tracking through your internal system? If a project uses maven or gradle, you need to rewrite those files to point to your internal builds instead?
Not a Googler, but I think the answer is: yes. At least, it is for my monorepo company.

Usually somebody else has already gone through the work of doing it for you. Sometimes there are tools that do the translation for you. For example, Go modules are quite easy to translate to a BUILD file.

It’s actually not as bad as it sounds. You only have to do the hard stuff once, and every engineer in the org who uses it in the future is thankful for it.

They use a tool called Blaze (Google around for “Bazel” which is the open source tool inspired by it). Basically you model the dependency tree such that the tool knows which targets are affected by a certain change, and then Blaze builds them in a clean room environment such that an undeclared dependency would cause the build to fail (hermetic builds). As far as I’m aware, this is the only way to sustainable operate a monorepo, but I would be happy to learn more if someone has other solutions.
I assume you mean third party dependencies that are not in the monorepo? Pretty much yes, monorepos struggle if they are expected to handle dependencies that aren't stored in the monorepo, so step 1 of using a dependency from outside of a monorepo should be to copy the source into the monorepo (and transitively copy the source of dependencies, etc).
Full dependency tree yep. No build in google's main repo ever retrieves code externally.
It's version control, not necessarily just source control! If something could benefit from being versioned, why would you not check it in? You then guarantee everybody has the same version. That's exactly what this thing is there for.

Git's design can limit its usefulness in this respect - though perhaps you could solve this to some extent with git LFS? - but not all version control systems have this problem.

git annex (or git LFS, if you buy into github's NIH) is requisite if you want to use git like this, broadly. git will happily store any and all binaries you ask it to, but upon (blind) checkout, it will grab every single revision of said binary, taking up as much however much space that takes.

(partial clones avoid this, but, as git isn't designed for this use case, grabbing all of history happens far too easily.)