Hacker News new | ask | show | jobs
by klodolph 2722 days ago
I feel like you've really done no work supporting your argument there. "Slow and inefficient"... what, exactly, is slow and inefficient? Because there are plenty of things slow and inefficient about polyrepos.

I'd say that open-source best practices for shared libraries are appropriate if you're making an open-source shared library. However, these practices are inappropriate for internal libraries, proprietary libraries, and other use cases. In my experience, it's also far from "problem solved". You can point your finger at semantic versioning but in the meantime we go through hell and back with package managers trying to manage transitive library dependencies and it SUCKS. Why, for example, do you think people are fed up with NPM and created Yarn? Or why people constantly complain about Pip / Pipenv and the like? Why was the module system in Go 1.11 such a big deal? The answer is that it's hard to follow best practices for shared libraries, and even when you do follow best practices, you end up with mistakes or problems. These take engineering effort to solve. One of the solutions available is to use a monorepo, which doesn't magically solve all of your problems, it just solves certain problems while creating new problems. You have to weigh the pros and cons of the approaches.

In my experience, the many problems with polyrepos are mostly replaced with the relatively minor problems of VCS scalability and a poor branching story (mostly for long-running branches).

1 comments

However, these practices are inappropriate for internal libraries, proprietary libraries, and other use cases.

Why do you say so?

Basically because for certain projects and teams, the effort to package internal / proprietary libraries and other similar dependencies can be much larger than the benefit. Packaging is effort. You decide to cut a release, stamp a version number, write a changelog, package and distribute it, and then backport fixes into a long-running branch.

This effort makes a lot of sense if your consumers are complete strangers who work for other organizations. If your consumers are in the same organization, then there are easier ways to achieve similar benefits. See Conway’s Law. It’s not an accident that code structure reflects the structure of the organization that created it, I would claim that organizational boundaries should be reflected in code. Introducing additional boundaries between members of the same organization should not be done lightly.

One of the main benefits of version numbers is that it tells your consumers where the breaking changes are, but if you have direct access to your consumers’ code and can commit changes, review them, and run their CI tests, then you have something much better than version numbers. If you are running different versions of various dependencies you can potentially have a combinatoric explosion of configurations. Then there’s the specter of unknown breaking changes being introduced into libraries. It happens, you can’t avoid it without spending an unreasonable amount of engineering effort, but the monorepo does make the changes easier to detect (because you can more easily run tests on downstream dependencies before committing).

Cross-cutting changes are also much more likely for certain types of projects. These are difficult with polyrepos for obvious reasons (most notably, the fact that you can’t do atomic commits across repos).

Packaging systems also have administrative overhead. If you shove everything in a monorepo you can ditch the packaging system and spend the overhead elsewhere. These days it’s simple enough to shove everything in the same build system.

Various companies that I’ve worked for have experimented with treating internal libraries the same way that public libraries are treated—with releases and version numbers. Most of them abandoned the approach and reallocated the effort elsewhere. The only company that I worked for that continued to use internal versioning and packaging was severely dysfunctional. One startup I worked for went all in on the polyrepo approach and it was a goddamn nightmare of additional effort, even though there were only like three engineers.

I broadly agree with all of this, though I think it's possible to simplify the business of packaging and releasing with the right automation. But lowering the cost doesn't change the more important question of whether that cost is worth bearing.

> One of the main benefits of version numbers is that it tells your consumers where the breaking changes are, but if you have direct access to your consumers’ code and can commit changes, review them, and run their CI tests, then you have something much better than version numbers.

A small peeve of mine: Semver and version numbers generally are lossy compression. They try to squeeze a wide range of information into a very narrow space, for no other reason than tradition.

I really don't understand what you describe as effort or huge burden. Writing a simple script that can solve your releasing tasks is simple. Imo a lot of engineers just want to write code but a lot of the time building software consists of other things too, such as testing, releasing, documentation etc. Simply avoiding them doesn't make it better.
If you think that releasing comes down to a simple script then you and I have radically different ideas about what it means to release something.

I’m also completely baffled by your statement that “simply avoiding them doesn’t make it better.” Reading that statement, I can only feel that I have somehow failed to communicate something and I’m not really sure what, because it seems obvious to me why the premise of this statement is wrong. When you avoid performing a certain task, like releasing software, which costs some number of work hours, you can reallocate those work hours to other tasks. It’s not like the tasks of releasing and versioning simply stop happening, but you also get additional hours to accomplish other things which may be more valuable. So it’s never an issue of “simply avoiding” some task, at least on functional teams, the issue is choosing between alternatives.

And it should also be obvious that cutting discrete releases for internal dependencies is not an absolute requirement, but a choice that individual organizations make depending on how they see the tradeoffs or their particular culture.

There really are many different ways to develop software, and I’ve seen plenty of engineers get hired and completely fail to adapt to some methodology or culture that they’re not used to. The polyrepo approach with discrete releases cut with version numbers and changelogs is a very high visibility way of developing software and it works very well in the open source world, but for very good reasons many software companies choose not to adopt these practices internally. It’s very sad when I see otherwise talented engineers leave the team for reasons like this.