Hacker News new | ask | show | jobs
by pwdisswordfish0 1699 days ago
Skip the nonsense and just check your dependencies in directly to your repo. The separation has no real world gains for developers and doesn't serve anyone except the host of your source repo. As it turns out most people's repo host is also the operator of the package registry they're using, so there aren't even theoretical gains for them, either.

Doing it this way doesn't preclude the ability to upgrade your dependencies, it _completely_ sidesteps the intentional or unintentional desync between a dependency's source and its releases, it means people have to go out of their way to get a deployment that isn't reproducible, and in 4 years when your project has rotted and someone tries to stand it up again even if just temporarily to effect some long-term migration, then they aren't going to run into problems because the packages and package manager changed out from beneath them. I run into this crap all the time to the point that people who claim it isn't a problem I know have to be lying.

8 comments

> I run into this crap all the time to the point that people who claim it isn't a problem I know have to be lying.

I don't think that's right.

Just because someone denies a problem exists—a problem that you know for a fact, with 100% certainty exists—doesn't mean they're lying.

It may mean you know they are wrong, but wrong != lying, and it's a good thing to keep in mind.

If you have external reasons to believe that the person you're talking to should or does know better, then it's fair to say they are lying.

But, in general, if you accuse someone who is simply wrong to be lying, you're going to immediately shut down any productive conversation that you could otherwise have.

People don't do this because `node_modules` can be absolutely massive (hundreds of megabytes or more), and a lot of people don't like (for various reasons) such large repositories.

There is a deprecated project at my work that committed the entire yarn offline cache to the repo. At least those were gzipped, but the repo still had a copy of every version of every dependency.

It isn't a good long term solution unless you really don't care at all about disk space or bandwidth (which you may or may not).

A middle ground that I've seen deployed is corporate node mirrors with whitelisted modules. Then individual repos can just point to the corporate repo. Same thing for jars, python packages, etc.
And build pipelines that fail due to the size of the repo.
Committing node_modules and reproducibility are somewhat not orthogonal though.

You can get reasonable degrees of reproducibility by choosing reasonable tools: Yarn lets you commit their binary and run that in the specified repo regardless of which version you have installed globally. Rush also allows you to enforce package manager versions. Bazel/rules_nodejs goes a step further and lets you pin node version per repo in addition to the package manager. Bazel+Bazelisk for version management of Bazel itself provides a very hermetic setup.

Packages themselves are immutable as long as you don't blow away your lockfile. I used to occasionally run into very nasty non-reproducibility issues with ancient packages using npm shrinkwrap (or worse, nothing at all), but since npm/yarn got lockfiles, these problems largely went away.

These days, the non-hermeticity stuff that really grinds my gears is the very low plumbing stuff. On Mac, Node-GYP uses xcode tooling to compile C++ modules, so stuff breaks with MacOS upgrades. I'm hoping someone can come up with some zig-based sanity here.

As for committing node_modules, there are pros and cons. Google famously does this at scale and my understanding is that they had to invest in custom tooling because upgrades and auditing were a nightmare otherwise. We briefly considered it at some point at work too but the version control noise was too much. At work, we've looked into committing tarballs (we're using yarn 3 now) but that also poses some challenges (our setup isn't quite able to deal w/ a large number of large blobs, and there are legal/auditing vs git performance trade-off concerns surrounding deletion of files from version control history)

Ill-advised tool adoption is exactly the problem I'm aiming to get people to wake up and say "no" to. You need only one version control system, not one reliable one plus one flaky one. Use the reliable one, and stop with the buzzword bandwagon, which is going to be a completely different landscape in 4 years.

> Packages themselves are immutable as long as you don't blow away your lockfile

Lockfiles mean nothing if it's not my project. "I just cloned a 4 year old repo and `npm install` is failing" is a ridiculous problem to have to deal with (which to repeat, is something that happens all the time whether people are willing to acknowledge it or not). This has to be addressed by making it part of the culture, which is where me telling you to commit your dependencies comes from.

This problem does happen, but committing node_modules won't fix it. Assuming the npm registry doesn't dissapear, npm will download the exact same files you would have committed to your repo. Wherever those files came from, 4 years later you upgraded your OS, and now the install step will fail (in my experience usually because of node-gyp).

Unless you were talking about committing compiled binaries as well, in which case every contributor must be running the same arch/OS/C++ stdlib version/etc. M1 laptops didn't exist 4 years ago. If I'm using an M1 today, how can I stand up this 4 year old app?

The real problem is reproducible builds, and that's not something git can solve.

Assuming the npm registry doesn't dissapear, npm will download the exact same files you would have committed to your repo

I think that’s where you missed what the gp is saying.

If everything is checked in to source control, npm will have nothing to download. You won’t need to call npm install at all, and if you do it will just return immediately saying everything is good to go already.

The workflow for devs grabbing a 10 year old project is to check it out then npm start it.

You are probably not running the same OS/node/python version as you were 10 years ago. If you were to try this in real life, you'd get an error like this one. https://stackoverflow.com/questions/68479416/upgrading-node-....

The error helpfully suggests you:

>Run `npm rebuild node-sass` to download the binding for your current environment.

Download bindings? Now you're right back where you started. https://cdn140.picsart.com/300640394082201.jpg

Of course if you keep a 10 year old laptop in a closet running Ubuntu 10.04 and Node 0.4, and never update it through the years, then your suggestion will work. But that workflow isn't for me.

> which to repeat, is something that happens all the time

Are you sure this isn't just a problem in your organization? As I qualified, the issue you're describing was a real pain maybe two or three years ago, but not anymore IME. For context, my day job currently involves project migrations into a monorepo (we're talking several hundred packages here) and non-reproducibility due to missing lockfiles is just not an issue these days for me.

As the other commenter mentioned, node-gyp is the main culprit of non-reproducibility nowadays, and committing deps doesn’t really solve that precisely because you often cannot commit arch-specific binaries, lest your CI will blow up trying to run mac binaries

> Are you sure this isn't just a problem in your organization?

I'm really struggling to understand the kind of confusion that would be necessary in order for this question to make sense.

Why do you suspect that this might be a problem "in [my] organization"? How could it even be? When I do a random walk through projects on the weekend, and my sights land on one where `npm install` ends up failing because GitHub is returning 404 for a dependency, what does how things are done in my organization have to do with that?

I get the dreadful feeling that despite my saying "[That] means nothing if it's not my project", you're unable to understand the scope of the discussion. When people caution their loved ones about the risk of being the victim of a drunk driving accident on New Years Eve, it doesn't suffice to say, "I won't drink and drive, so that means I won't be involved a drunk driving accident." The way we interact with the whole rest of the world and the way it interacts with us is what's important. I'm not concerned about projects under my control failing.

> non-reproducibility due to missing lockfiles is just not an issue

Why do you think that's what we're talking about? That's not what we're talking about. (I didn't even say anything about lockfiles until you brought it up.) You're not seeing the problem, because you're insisting on trying to understand it through a peephole.

I mean, of course I'm going to see this from the lenses of my personal experience (which is that nasty non-reproducibility issues usually would only happen when someone takes over some internal project that had been sitting in a closet for years and the original owner is no longer at the company). Stumbling upon reproducibility issues in 4 year old projects on Github is just not something that happens to me (and I have contributed to projects where, say, Travis CI had been broken in master branch for node 0.10 or whatever) and getting 404s on dependencies is something I can't say I've experienced (unless we're talking about very narrow cases like consuming hijacked versions of packages that were since unpublished) or possibly a different stack that uses git commits for package management (say, C) - and even then, that's not something I've run into (I've messed around w/ C, zig and go projects, if it matters). I don't think it's a matter of me having a narrow perspective, but maybe you could enlighten me.

As I mentioned, my experience involves seeing literally hundreds of packages, many of which were in a context where code rot is more likely to happen (because people typically don't maintain stuff after they leave a company and big tech attrition rate is high, and my company specifically had a huge NIH era). My negative OSS experience has mostly been that package owners abandon projects and don't even respond to github issues in the first place. I wouldn't be in a position to dictate that they should commit node_modules in that case.

Maybe you could give me an example of the types of projects you're talking about? I'm legitimately curious.

> I don't think it's a matter of me having a narrow perspective, but maybe you could enlighten me.

I have to think it is, because the "shape" of your responses here have been advice about things that I/we can do to keep my/our own projects (e.g. mission critical or corporate funded stuff being used in the line of business) rolling, and completely neglected the injured-in-collision-with-other-intoxicated-driver aspect.

Again, I _have_ to think that your lack of contact with these problems has something to do with the particulars of your situation and the narrow patterns that your experience falls within. Of the projects that match the criteria, easily 40% of them are the type I described. (And, please, no pawning it off with a response like "must be a bad configuration on your machine"; these are consistent observations over the bigger part of a decade across many different systems on different machines. It's endemic, not something that can be explained away with the must-be-your-org/must-be-your-installation handwaving.)

> code rot is more likely [...] My negative OSS experience has mostly been that package owners abandon projects and don't even respond to github issues

Sure, but the existence of other problems doesn't negate the existence of this class of problems.

> I wouldn't be in a position to dictate that they should commit node_modules in that case.

Which is why I mentioned that this needs to be baked in to the culture. As it stands, the culture is to discourage simple, sensible solutions and prefers throwing even more overengineered tooling that ends up creating new problems of its own and only halfway solves a fraction of the original ones. (Why? Probably because NodeJS/NPM programmers seem to associate solutions that look simple as being too easy, too amateurish—probably because of how often other communities shit on JS. So NPMers always favor the option that looks like Real Serious Business because it either approaches the problem by heaping more LOC somewhere or—even better—it involves tooling that looks like it must have taken a Grown Up to come up with.)

> Maybe you could give me an example of the types of projects you're talking about? I'm legitimately curious.

Sure, I don't even have to reach since as I said this stuff happens constantly. In this case, at the time of writing the comments in question, it was literally the last project I attempted: check out the Web X-Ray repo <https://github.com/mozilla/goggles.mozilla.org/>.

This is conceptually and architecturally a very simple project, with even simpler engineering requirements, and yet trying to enter the time capsule and resurrect it will bring you to your knees on the zeroeth step of just trying to fetch the requisite packages. That's to say nothing of the (very legitimate) issue involving the expectation that even with a successful completion of `npm install` I'd still generally expect one or more packages for any given project to be broken and in need of being hacked in order to get working again, owing to platform changes. (Several other commenters have brought this up, but, bizarrely, they do so as if it's a retort to the point that I'm making and not as if they're actually helping make my case for me... The messy reasoning involved there is pretty odd.)

> At work, we've looked into committing tarballs (we're using yarn 3 now) but that also poses some challenges (our setup isn't quite able to deal w/ a large number of large blobs, and there are legal/auditing vs git performance trade-off concerns surrounding deletion of files from version control history

With Git LFS the performance hit should be relatively minimal (if you delete the file it won't redownload it on each new clone anyways, stuff like this).

And what happens when you need to update those dependencies?

Software is a living beast, you can't keep it alive on 4yr-old dependencies. In fact, you've cursed it with unpatched bugs and security issues.

Yes, keep a separate repo, but also keep it updated. The best approach is to maintain a lag between your packages and upstream so issues like these are hopefully detected & corrected before you update.

> And what happens when you need to update those dependencies?

Then you update them just like you do otherwise, like I already said is possible.

> you can't keep it alive on 4yr-old dependencies. In fact, you've cursed it with unpatched bugs and security issues

This is misdirection. No one is arguing for the bad thing you're trying to bring up.

Commit your dependencies.

Let's say that the day you update your dependencies is after this malware was injected but before it was noticed.

Now you have malware in your local repo :(

Having a local repo does not prevent malware. Your exposure to risk is less because you update your dependencies less frequently, but the risk still exists and needs to be managed. There's no silver bullet.

This is more misdirection. By no means am I arguing that if you're doing a thousand stupid things and then start checking in a copy of your dependencies, that you're magically good. _Yes_ you're still gonna need to sort yourself out re the 999 other dumb things.
Sounds like a great way to end up with what $lastco called "super builds" - massive repos with ridiculous amounts of cmake code to get the thing to compile somewhat reliably. It was a rite of passage for new hires to have them spend a week just trying to get it to compile.

All this does is concentrate what would be occasionally pruning and tending to dependencies to periodic massive deprecation "parties" when stuff flat out no longer works and deadlines are looming.

That’s the whole deal with yarn 2 isn’t it? With their plug’n’play it becomes feasible to actually vendor your npm deps, since instead of thousands upon thousands (upon thousands) of files you only check in a hundred or so zip files, which git handles much more gracefully.

I was skeptical at first as it all seemed like way too much hoops to jump through, but the more I think about it the more it feels that it’s worth it.

> Skip the nonsense and just check your dependencies in directly to your repo.

Haha, no.

That would increase the size of the repository greatly. Ideally, you would want a local proxy where the dependencies are downloaded and managed or tarball the node_modules and save it in some artifacts manager, server, or s3 bucket

What's the problem with a big repository? The files still need to be downloaded from somewhere. It's mostly just text anyway so no big blobs which is usually what causes git to choke.

For that one-off occasion when you are on 3G, have a new computer without an older clone, and need to edit files without compiling the project (which would have required npm install anyway), there is git partial clone.

Does npm have a shared cache if you have several projects using the same dependencies?

>Does npm have a shared cache if you have several projects using the same dependencies?

pnpm does, that's why I'm using it for everything. It's saving me many gigabytes of precious SSD space.

https://github.com/pnpm/pnpm

Now anything with native bindings is broken if you so much as sneeze.