Hacker News new | ask | show | jobs
by hliyan 2614 days ago
Are we mistaking a dependency control problem as a revision control problem?

In a previous life, before microservices, CI/CD etc. existed, we did just fine with 20-30 CVS repositories, each representing a separate component (a running process) in a very large distributed system.

The only difference was that we did not have to marshal a large number of 3rd party dependencies that were constantly undergoing version changes. We basically relied on C++, the standard template library and a tightly version controlled set of internal libraries with a single stable version shared across the entire org. The whole system would have been between 750,000 - 1,000,000 lines of code (libraries included).

I'm not saying that that's the right approach. But it's mind boggling for me that we can't solve this problem easily anymore.

5 comments

My preferred approach for a microservice architecture:

- Contract-first API development

- All API contract definition files (OpenAPI/Swagger, .proto, .wsdl...) in a single repo, which has a CICD pipeline to bundle them into artifacts for various platforms (Maven, Nuget, NPM, gem...)

- Consumers and producers import the "api-contracts" dependency; this is the only coupling between components

- Consumers and producers both generate necessary code (server stubs, client libraries) at build time

IMHO, if your service clients have dependencies on implementations of APIs rather than just the definitions, you're not realizing the key benefit of microservices (or SOA).

I agree with your last point in theory, but in practice consumers start to rely on bugs and implementation details, and eventually it is easier to change the contract that fix the clients.
Hyrum's law and all.

But I don't think that's what he was saying so much as your clients shouldn't depend on the server code, only the api definition. Which is true and possible in general.

That is what I was saying; but [s]he's right that implementation details always seep in by way of assumptions that clients make. It's extremely expensive to write an API definition that encompasses every possible edge case—probably only feasible in certain life/money-critical applications.
Yeah, that is always a danger. From a purist standpoint, I'd argue that any behavior not defined explicitly in the API contract is subject to change at any time, and clients relying on it are by definition buggy. But I recognize that that often doesn't matter when the client code is owned by a team under a director with more clout than yours, a valuable customer, etc.

One possible solution would be to bump the major version (assuming semver) of the API contract, and support multiple versions of the API simultaneously. Of course, that has its own challenges and costs.

I dunno, the only 3rd party over-dependencies I see is in frontend code and that code is usually in one big repo (in my personal experience). I think the proliferation of NPM dependencies is its own problem, but usually when I'm thinking of mono-repo vs multi-repo its because teams/repos are having trouble coordinating between each other, not because some NPM library hasn't been updated lately.
Most of the mono-repo advocates aren't talking about dependency management. They are talking about the advantages around continuous integration that a good mono-repo tool can bring.

The article actually complains less about mono-repos and more about mono-repos on Git and the associated tooling around Git.

> Most of the mono-repo advocates aren't talking about dependency management.

The article however states dependency managements as the main complain, to the point that it's mentioned immediately after the first point where monorepos are mentioned.

Yeah, I found the article to be more about the deficiencies of Git for things version control isn't meant to solve and less about the problems with mono-repos themselves.
You’re not wrong. Part of it is the willingness of people to reach for a dependency that amounts to a few lines of code to avoid.

It would be nice if there was a tool that could help you identify just how much of each dependency you actually depend on so you could trim it.

These things all exist if you use something like bazel/pants/buck to manage your dependencies. When you can construct a DAG of the entire dependency structure you can see exactly how much you depend on any given thing (and get fun dot-graphs of it!). But that requires being precise with dependency declaration in a way that a lot of people don't want to be.
> But that requires being precise with dependency declaration in a way that a lot of people don't want to be.

Some programming language stacks already fix that problem in a transparent way. Take Microsoft's .NET Core+Nuget stack. Developers can add packages to a project without specifying a version number (implicitly it's the latest release) and dependencies are checked when all dependencies are restored.

IIRC Rust's cargo also follow a similar approach, and so do npm and yarn. So, that's pretty much standard at this point.

Is there something that is akin to development-time tree-shaking (as opposed to build time)? i.e. you pull a copy of the specific library functions directly into your source?
This is called “vendoring” your dependencies (taking a snapshot into your SCM), and has been common practice for about 30 years. Long before NPM and other language-specific package managers.

Tools for managing vendor branches or sub-trees abound, but good old svn:external and scripts work for most use cases.

one of the go proverbs is a little copying is better than a little dependency. [1]

Also go vendoring tools usually trim the repos down to the packages you import.

[1]: https://go-proverbs.github.io/

Monorepo are used because of internal dependencies, but there are already very good solutions for that. We have as an org a lot of projects (50+) but also split out common functionality (as it makes sense) into components which are shared between projects. How do we share those? In our case (.NET) we have an internal NuGet source which contains the components in question. Each project can upgrade to the later version of an component at its own schedule, just like 3rd party dependencies are updated when necessary.

It does not have to be complicated.

The article also points to the issue of multiple repository management, and even includes links to three possible options to solve it.

The questions around "which repositories do I need?" and "how do I update all of them?" and "how do I make an atomic transaction [commit, branch, PR] across all of them?" are interesting questions in a multi-repo situation, but there are plenty of possible answers as well.

Some of them are just social in nature (read the README, watch/follow the whole GitHub organization, etc), so they aren't are as interesting technically as monorepo or "meta-repo" tools.