Hacker News new | ask | show | jobs
by limsup 3138 days ago
Check in your node_modules. You are badly correcting the wrong problem. npm repo should not be hit for every checkout, deploy, and CI test. Just commit your dependencies - and if you need to hack them, you can float a patch with git.
3 comments

You're also correcting the wrong problem. You can have your deps outside of your repo and still not go off to the net each time. You should cache or mirror the repo and build from that.

At some point the convenience of public repos made people forget this.

One important argument against this practice is that node_modules is usually not cross-platform. It's likely that somewhere within your dependency graph, there is a package that contains native code, which results in platform-specific binaries in your git repository.
Watch out! The "pure and perfect git history is more important than any other consideration" crowd are going to take you apart for this heretical notion!
More like the "we don't want a 10gb git repo" crowd to be honest.

Checking in not only your code, but your entire history of every dependency and every change within that dependency graph is an amazingly bad idea when frontend apps often have 600mb+ of dependencies that change somewhat frequently, and git keeps a version of every file in its history.

But you're going to have to install them either way. We only update dependencies when they are needed, so in practice this is rarely an issue. We have also gotten in the practice of making separate branches for updates to keep Pull Requests sane.
Part of my current project has a node_modules folder that is 400MB and contains 37853 files. Seeing as updating a dependency can cause a cascading effect that can touch a lot of different files, it's not a good idea to store them in git. It's akin to the reason you don't store binary files in git.

If installs are too slow, switch to a caching yarn/npm mirror. Cloudflare has one. Or run a local caching proxy for your team.

The repo doesn't get big because the dependencies are huge at a point in time (though they may be); it gets big because they change over time, and history gets huge and intensely messy in this situation. Shallow clones, branching, and fetch tuning are band-aids on this, which, over enough time (which in practice can be as short as months on a big project being iterated on by tons of people) will still cause clone/deployment problems, repo storage problems, and lost time. And don't start with the "well then just break it up into multiple git repos!" stuff. That's not only incredibly hard to do in practice/in the face of legacy, but it's fundamentally just another constant-factor mitigation on the problem. Sometimes, checking in all the deps makes sense. Sometimes it doesn't. Big node projects are a situation where it often doesn't.