| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by klodolph 1230 days ago

> The next deploy took our service down.

How would multi-repo change this? A dependency updated, and code broke, and the new version was broken—but you update dependencies in multi-repo anyway, and deployments can be broken anyway. I don’t see how multi-repo mitigates this.

> It encourages poor API contracts because it lets anyone import any code in any service arbitrarily.

This has nothing at all to do with monorepos. Google’s own software is built with a tool called Bazel, and Meta has something similar called Buck. These tools let you build the same kind of fine-grained boundaries that you would expect from packaged libraries. In fact, I’d say that the boundaries and API contracts are better when you use tools like Bazel or Buck—instead of just being stuck with something like a private/public distinction, you basically have the freedom to define ACLs on your packages. This is often way too much power for common use cases but it is nice to have it around when you need it, and it’s very easy to work with.

A common way to use this—suppose you have a service. The service code is private, you can’t depend on it. The client library is public, you can import it. The client library may have some internal code which has an ACL so it can only be imported from the client library front-end.

Here’s how we updated services—first add new functionality to the service. Then make the corresponding changes to the client. Finally, push any changes downstream. The service may have to work with multiple versions of the client library at any time, so you have to test with old client libraries. But we also have a “build horizon”—binaries older than some threshold, like 90 days or 180 days or something, are not permitted in production. Because of the build horizon, we know that we only have to support versions of the client library made within the last 90 or 180 days or whatever.

This is for services with “thick clients”—you could cut out the client library and just make RPCs directly, if that was appropriate for your service.

> It encourages a ton of code churn with very low signal.

The places I worked at that had monorepos, you might filter out the automated code changes there to do automated migrations to new APIs. One PR per week sounds pretty manageable, when spread across a team.

Then again, I’ve also worked at places where I had a high meeting load, and barely enough time to get my work done, so maybe one PR per week is burdensome if your are scheduled to death in meetings.

1 comments

lopkeny12ko 1230 days ago

> How would multi-repo change this? A dependency updated, and code broke, and the new version was broken—but you update dependencies in multi-repo anyway, and deployments can be broken anyway. I don’t see how multi-repo mitigates this.

In a multi-repo world, I control the repo for my own service. For a business-critical service in maintenance mode (with no active feature development), there's no reason for me to upgrade the dependencies. Code changes are the #1 cause of incidents; why fix something that isn't broken?

We would have avoided this problem had we not migrated to the monorepo simply because, well, we would have never pulled in the dependency upgrade in the first place.

> In fact, I’d say that the boundaries and API contracts are better when you use tools like Bazel or Buck

I'm familiar with both of these tools, and I agree with this point. However, you are making an implicit assumption that 1. the monorepo in question is built with a tool like Bazel that can enforce code visibility, and 2. that there exists a team or group of volunteers to maintain such a build system across the entire repo. I suspect both of these are not true for the vast majority of codebases outside of FAANG.

> The places I worked at that had monorepos, you might filter out the automated code changes there to do automated migrations to new APIs

Sure, this solves a logistical problem, but not the underlying technical problem of low-signal PRs. I would argue that doing this is an antipattern because it desensitizes service owners from reviewing PRs.

link

brianwawok 1230 days ago

Not updating old libraries is how you end up getting known security vulns years after they are patched.

link

lopkeny12ko 1230 days ago

You should ask your colleagues who work in critical industries like banking and healthcare how much of their software stack depends on things that haven't been patched in more than 20 years ;)

link

lavishlatern 1230 days ago

I've worked in healthcare, I find the software practices absolutely atrocious. The consequence of this has been ransomware attacks:

https://www.usnews.com/news/health-news/articles/2022-10-10/...

https://www.bloomberg.com/news/features/2023-02-03/ireland-h...

https://floridapolitics.com/archives/585686-tallahassee-memo...

https://www.oceancitytoday.com/news/atlantic-general-hospita...

link

throwaway2037 1230 days ago

    "critical industries like banking and healthcare".

What a red herring. This comment reads like ChatGPT was trained on Reddit forums. 99% of the software in those industries runs "inside the moat" where security doesn't matter. I am still running log4j from 10 years ago in lots of my stack, and it is the swiss cheese of software security! Who cares! It works! I'm inside the moat! If people want to do dumb black hat stuff, they get fired. Problem solved.

Also what does "banking" mean anyway? That comment is so generic as to be meaningless. If you are talking about Internet-facing retail banks in 2023, most are very serious about security... because regulations, and giants fines when they get it wrong. And if the fines aren't large enough in your country, tell your democratically elected officials to 10x the fines. It will change industry behaviour instantly -- see US investment banks' risk taking after the Vocker Rule/Dodd-Frank regulations.

link

saagarjha 1230 days ago

> If people want to do dumb black hat stuff, they get fired.

Firing people doesn’t get you un-hacked. When your risk model involves threats coming from the inside (and at sufficient scale and value it definitely should) then you want to harden things internally too.

link

dx034 1229 days ago

I don't think it matters much if you're inside the moat. Running vulnerable software inside the moat makes it very easy for an attacker to move laterally once they're in. Patching everything where possible reduces the blast radius of an attack massively.

link

blandflakes 1230 days ago

Both industries that are notorious for poor security hygiene, so I'm not sure this is the coup you were looking for.

link

precommunicator 1230 days ago

Healthcare developer here, developing for German market, we've used Java preview features and unstable React versions many times before. And we literally have two different roles on our team for upgrading vulnerable dependencies whenever we get an alert.

link

geraldwhen 1230 days ago

I know devs in several and you can’t even deploy to QA if a dependency has a known vulnerability.

link

klodolph 1229 days ago

> In a multi-repo world, I control the repo for my own service. For a business-critical service in maintenance mode (with no active feature development), there's no reason for me to upgrade the dependencies. Code changes are the #1 cause of incidents; why fix something that isn't broken?

This is now the “what is code rot?” discussion, which is an incredibly deep and nuanced discussion and I’m not going to do it justice here.

Just to pick an example—if you have an old enough version of your SSL library, it won’t be able to connect to modern web servers, depending on the configuration (no algorithms in common). If you have old database software, maybe it won’t work with the centralized backup service you have. If your software is stuck on 32-bit, maybe you run out of RAM, or maybe the vendors stop supporting the hardware you need. If you need old development tools to build your software, maybe the developers won’t be able to make changes in the future when they actually become necessary. What if your code only builds with Visual Studio 6.0, and you can’t find a copy, and you need to fix a defect now?

As much as I like the idea of building software once and then running the same version for an eternity, I prefer the idea of updating dependencies and spending some more time on maintenance. I advocate for a certain minimum amount of code churn. If the code churn falls too low, you end up with getting blind-sided by problems and don’t have any expertise to deal with it. My personal experience is that the amount of time you spend on maintenance with changing dependencies doesn’t have to be burdensome, but it’s project-dependent. Some libraries you depend on will cause headaches when you upgrade, some won’t. Good developers know how to vet dependencies.

If you really need an old version of a dependency, you can always vendor an old version of it.

If you can’t afford to put developers on maintenance work and make changes to old projects, then maybe those projects should move from maintenance to sunset.

> that there exists a team or group of volunteers to maintain such a build system across the entire repo

Bazel doesn’t require a whole team. My personal experience is that a lot of teams end up with one or two people who know the build system and know how to keep the CI running, and I don’t think this changes much with Bazel.

Bazel is actually very good at isolating build failures. You can do all sorts of things that break the build and it will only affect certain targets. It is better at this than a lot of other tools.

> underlying technical problem of low-signal PRs

I honestly don’t see a few low-signal PRs as a problem. PRs are not there to provide “signal”, they are just there to logically group changes.

The kind of PRs that I see go by in monorepos are things like “library X has deprecated interface Y, and the maintainers of X are migrating Y to Z so they can remove Y later”. Maybe your experience is different.

I do think that owners should not feel that they need to carefully review every PR that touches the code that they own. This is, IMO, its own anti-pattern—your developers should build processes that they trust, monitor the rate at which defects get deployed to production, and address systemic problems that lead to production outages. Carefully reviewing each PR only makes sense in certain contexts.

If you’re working in some environment where you do need that kind of scrutiny for every change you make, then you are probably also in an enviroment where you need to apply that scrutiny to dependencies. Maybe that means your code has fewer dependencies and relies more on the stdlib.

link