Hacker News new | ask | show | jobs
by ejcx 3172 days ago
If you're using microservices and care about security, do yourself a favor and use a monorepo.

A lot of improving security is about changing things in small ways but across the entire fleet. If you have microservices without a monorepo you oftentimes need to make the same changes in potentially hundreds of places.

This makes it a lot easier to do things like enforce standards for repos. Code coverage. Testing. Unsafe function use. Repo sprawl makes microservice security very challenging, and it isn't mentioned in this blog post. Losing track of services and leaving specific services behind is not good.

7 comments

> If you're using microservices and care about security, do yourself a favor and use a monorepo.

This seems like a strong reminder that "microservices" aren't really about having lots of independent little systems but are a different way of factoring your one big system.

It's like FizzBuzz - do you handle the 3 first or the 5?
You handle the 15 first to avoid the accumulator requirement.
My 15 falls through both the 3 and the 5 paths. So I must do 3 first, or 15 will be BuzzFizz.
No, I'm saying just do a switch with four branches, %15, %5, %3, default, and break all of them. That way you're explicit about the ambiguous case and you avoid string concat or stateful stdout logic.
I think the point was that you do

If n%15==0

First, then the others. But it’s a pedantic point anyway (like this one)

Or you can print fizz and buzz in independent statements, or it against printing the raw number, and then print a new line for each number in the end.
yes, yes, a thousand times yes.

We implemented our services like this a few years back. It worked really nice. But in everything that I have read I have never seen any references to this practice. I didn't even know it was called "monorepo". I was just assuming everybody was using multiple repos and we were weird.

Google and Facebook are the largest examples of monorepos that I'm aware of. Here is a write-up about it overall, though you don't need to be convinced :)

https://danluu.com/monorepo/

Both Google and Facebook are also dealing with such large repos that they've needed to start either customizing or building their own SCM's. MS started the GitVFS project to do a similar thing to suit their needs.

Most people aren't at that scale, but IMO, many benefits people get from monorepos you also get by using GitHub/Gitlab with master projects mapping in git repos via sub-modules.

Anyway, it really sucks to work on extremely large monorepos when you don't have access to the same resources as Google and Facebook. For this reason I'm personally always hesitant to recommend monorepos as the be-all-end-all.

A good number of engineers from facebbok and google are active developing on mercurial, even if it gets comparatively less attention from the big public.

Facebook also maintains a set of custom extension for it [0], and there is an interesting talk about the reasons beyond their choice [1].

[0] https://phab.mercurial-scm.org/diffusion/FBHGX/

[1] https://m.youtube.com/watch?v=gOVD-DrUpwQ

There is also Mononoke, a HG server being built in Rust: https://github.com/facebookexperimental/mononoke

(work for FB)

That's really interesting. I've updated wikipedia mentioning this.
The fact that Google and Facebook go to such lengths means that there is a big benefit on using the monorepo. Or perhaps they are stuck with it.

> For this reason I'm personally always hesitant to recommend monorepos as the be-all-end-all.

May be not a monorepo, but at least trying avoid having too many repos.

Again, I go back to the sub-module approach with some small amount of automation built around that. Managing sub-modules kinda sucks, but if you use it as a reference to all of the products in production and as a synchronization point for delivery, then it gives you all the benefits of the monorepo, without the performance issues.
> it really sucks to work on extremely large monorepos

What kind of scale are we talking, and what issues do you get?

I combined 50 repos into a mono repo. It sucks because git is very slow and the logs are noisy but having 50 repos sucks worse. Deploys also take forever but at least they're atomic now. The real issue was the previous devs copy and pasting codebases because hey heard microservices were web scale. Overall I'm happy with the tradeoff it's allowed me to unify the look and feel of the platform a lot easier by just doing platform wide find and replace, and also revert across the board if stuff goes wrong. It's also allowed me to start removing duplicate code bases. Instead of doing 50 repos for 50 apps, a better idea is to do layers. One repo for the backend. One for the frontend. One for the cache layer, etc. you start separating out your electron app from your web app into separate repos and you'll find you need even more repos for shared UI elements. It can easily lead to copy pasting or worse an explosion of repos. You don't want to tell your new hires "welcome to ABC corp. as your first task clone 90 repos and npm install each one". If you're going to do it at least write a script to set the whole build up in one go. Also keep in mind the tech giants mostly used monoliths up until thousands of employees before refactoring to microservices. For auth you should probably have every app validate tokens by going out over the network to the auth microservice. This way you can easily switch, for example from JWT to sessions, in one place.
600k+ files at 12gigs of Repo (without history). I've been trying to work on what option we haven't to get off our old SCM. Right now Git is potentially too slow, and that's just the local system problem. Git LFS works decently well large files.

I've explored lots of different options, and hope to look at mercurial at some point, but am not hopeful.

Uber as well.
A monorepo is certainly worth the consideration, but there are other options if you leverage CI (such as dependent builds).

I was functioning as an architect at a pretty large company and we used spring boot. My team wrote a number of internal starters (and our own parent Pom) that all other teams would use and set it up so that the services would build/test/deploy after our base pom and starters would release. It obviously takes a bit of time and tooling to do that, but it was working quite well for us and still kept us from needing to update the same thing manually in 20 places (that is until we’d need to release a breaking version).

This is overstating. Monorepos have some terrible trade offs.

I think a monorepo works well for company cultures that have a lot of internal code, it works less well when you have a highly decentralized mode of operating (the whole point of microservices IMO) and a lot of shared externally written open source code. Repo sprawl isn’t an issue if you have known orgs - the security team’s CI/CD checks them all and files issues or PRs to them all.

"Care about security" is bad way to phrase it. Of course everyone cares about security, only the levels and requirements differ. Please note that I'm only responding because this fallacy comes up a lot of times, and one needs to be transparent with trade-offs when suggesting something like microservices with a monorepo:

* If you have 100 services in a monorepo, then it needs completely different toolchain like bazel/buck (all new) or cmake/qmake/etc, to find out the whole dependency graph, with deep integration with the SCM to rebuild only changes and downstreams, avoiding a 2-hour build, avoiding 10gb release artifacts, scoped CI/commit builds (i.e. triggering build for only one changed folder instead of triggering a million tests), independent releases, etc

* Some more tooling for large repository and LFS management, buildchain optimization, completely different build farm strategies to run tests for one build across many agents, etc

* Making sure people don't create a much worse dependency graph or shared messes, because its now easier to peek directly into every other module. Have you worked in companies where developers still have troubles with maven multi-module projects? Now imagine 10x of that. Making sure services don't get stuck on shared library dependencies. Should be able to use guice 4.0 for one service, 4.1 for another, jersey 1.x f/a, jersey 2.x f/a, etc etc. Otherwise it becomes an all-or-nothing change, and falls back to being a monolith where services can't evolve without affecting others

* Does not mean its easy break compatibility and do continuous delivery (no, there is database changes, old clients, staggered rollout, rollbacks, etc. contracts must always be honored, no escaping that, a service has to be compatible with its own previous version for any sane continuous delivery process)

Imagine monorepo like: building a new Ubuntu OS release for every Firefox update, and then work backwards, doing it for every single commit. I'm not even scratching the surface of anything here. It changes everything - from how you develop, integrate, test, deploy, git workflows, etc. This is why big monorepo companies like Facebook/Google release things like bittorrent-based deployments, new compression algorithms, etc - because that's the outcome of dealing with a monorepo.

I may go as far to say this, after many many journeys:

Monorepo with monolith - natural, lots of community tooling, lots of solved problems.

Multi-repos with multi-services - natural, lots of community tooling, lots of solved problems.

Anything else without the right people who have already done it many times, and you're in for a painful rediscovery journey of what Google/Facebook went through, and this does not have as much knowledgebase/tooling/community/etc as other natural approaches.

Additionally, it's important to eliminate as much duplicate code as possible through shared directories/libraries or else you will end up in file/dll hell.
Sure. We do this with ansible, and an site.yml mapping roles to systems.

BUT

Changes are hard (tm)

Execution order changes with each change requirements.

Should i commit this now? What if other team mate executes now?

It's not easy.