Hacker News new | ask | show | jobs
by shadowgovt 1680 days ago
That's going to be incompatible with writing interesting software on the web, unless we want to just hand the problem over to a handful of big players who can afford to hand-vet 10,000 dependencies.

The reason packages are so big is the complexity for an interesting app is irreducible. People don't import thousands of modules for fun; they do it because simple software tends towards requiring complex underpinning. Consider the amount of operating system that underlies a simple "Hello, world!" GUI app. And since the browser-provided abstractions are trash for writing a web app, people swap them out with frameworks.

I'm working on a React app right now where I've imported about a dozen dependencies explicitly (half of which are TypeScript @type files, so closer to a half-dozen). The total size of my `node_modules` directory is closer to a couple hundred packages. It's 35MB of files. And no, I couldn't really leave any of them out to do the thing I want to do, unfortunately.

9 comments

People oftentimes do this, with suspicious reasoning. Classic examples:

1) "We have is-array as a dependency" Why? Well, pre Array.isArray, there wasn't anything built-in. Why not just write a little utility function which does what is-array does? See #3

2) "We have both joi and io-ts. Don't they do roughly the same thing?" They do; io object validation. New code uses io-ts, but a bunch of old code relies on joi. Should we update it? Eh we'll get around to it (we never do).

3) "is-array is ten lines of code. why don't we just copy-paste it?" Multiple arguments against this, most bad. Maybe the license doesn't support it. More usually; fear that something will change and you'll have to maintain the code you've pasted without the skills to do so. Better to outsource it (then, naturally, discount the cost of outsourcing).

4) "JSON.parse is built-in, but we want to use YAML for this". So, you use YAML. And need a dependency. Just use JSON! This is all-over, not just in serialization, but in UI especially; the cost analysis between building some UI component (reasonably understood cost) versus finding a library for it (poorly understood cost, always underestimated).

Not all dependency usage is irreducible. Most is. But some of it is born, fundamentally, out of a cost discount on dependency maintenance and a corporate deprioritization of security (in action; usually not in words).

The counterpoint is all the security issues generated when dev teams re-implement the already-well-implemented. Your points are valid, but as with anything, it is not cut and dry.
If your software is ultimately dependent on thousands of other modules from various developers all over the Internet, you have no idea whether what you're depending on is actually well implemented or not.
Didn't you just describe most Linux distributions?
No. First, Linux is an entire operating system, not a single application. Second, when people pull software from their Linux distribution that ultimately comes from developers all over the Internet, they do it to use the software themselves, not to develop applications that others are going to have to deal with. Third, Linux distributions put an extra layer of vetting in between their upstream developers and their users. And for a fourth if we need it, I am not aware of any major Linux distribution that has pulled anything like the bonehead mistakes that were admitted to in this article.
> No. First, Linux is an entire operating system, not a single application.

Sorry, to clarify: when I say "Linux distro" here, I mean the distribution package sets, like Debian or Ubuntu.

> Second, when people pull software from their Linux distribution that ultimately comes from developers all over the Internet, they do it to use the software themselves, not to develop applications that others are going to have to deal with.

The distros are chock full of intermediary code libraries that people use all the time to build novel applications depending on those libraries, which they then distribute via the distro package managers. I'm not quite sure what you mean here... I've never downloaded libfftw3-bin for its own sake; 100% of the time I've done that because someone developed an application using it that I now have to deal with.

Conversely, I've also used NodeJS and npm to build applications I intend to use myself. It's a great framework for making a standalone localhost-only server that talks to a Chrome plugin to augment the behavior of some site (like synchronizing between GitHub and a local code repo by allowing me to kick off a push or PR from both the command line and the browser with the same service).

> Third, Linux distributions put an extra layer of vetting in between their upstream developers and their users.

This is a good point. It's a centralization where npm tries to solve this problem via a distributed solution, but I'm personally leaning in the direction that the solution the distros use is the right way to go.

When I'm writing desktop software, I don't have to worry about whether yaml adds a dependency that I can't afford to maintain.

People who develop web apps want that level of convenience. And if we can't solve the security problem in a distributed fashion, web development will end up owned by big players who can pay the money to solve the problem in a centralized fashion.

> When I'm writing desktop software, I don't have to worry about whether yaml adds a dependency that I can't afford to maintain.

Why not? Because some big, centralized player has put the time, effort, and money into making yaml part of a complete library that gives you everything you need to write desktop software. Nobody writes desktop software by importing thousands of tiny libraries from all over the Internet.

I agree. As I said at the top of this thread,

> That's going to be incompatible with writing interesting software on the web, unless we want to just hand the problem over to a handful of big players who can afford to hand-vet 10,000 dependencies.

Consolidating into a distro-management-style solution would be one option.

> why don't we just copy-paste it? ... Maybe license doesn't support it.

You did say the argument was bad, but a license that prevents you from making a copy manually but allows you to make a copy though the package manager isn't a thing, is it? In either case the output of your build process is a derived work that needs to comply with the license.

Unless, perhaps, you have a LGPL dependency that you include by dynamic linking (or the equivalent in JS – inclusion as a separate script rather than bundling it?) in a non-GPL application and make sure the end user is given the opportunity to replace with their own version as required by the license.

> The reason packages are so big is the complexity for an interesting app is irreducible

These kinds of claims demand data, not just bare assertions of their truthiness.

Firefox, as an app with an Electron-style architecture (before Electron even existed), was doing some pretty interesting stuff circa 2011 (including stuff that it can't do now, like give you a menu item and a toolbar button that takes you to a page's RSS feed), with a bunch of its application logic embodied in something like well under <250k LOC of JS.

The last time I measured it, a Hello World created by following create-react-app's README required about half a _gigabyte_ of disk space between just before the first `npm install` and "done".

That NPM programmers don't know _how_ to write code without the kind of complexity that we see today is one matter. The claim that the complexity is irreducible is an entirely different matter.

Firefox's 250k LOC are riding on the millions of lines of code of the underlying operating system and GUI | TCP | audio toolkits that it used. To compare it to npm development, you would need to factor in the total footprint of every package that you had to install to compile Firefox in 2011.

... And I think it's an interesting question to ask why we can trust the security of, say, Debian packages and not npm, given how many packages I have to pull down to compile Firefox that I haven't personally vetted.

> Firefox's 250k LOC are riding on the millions of lines of code of the underlying operating system and GUI | TCP | audio toolkits that it used.

Right, just like every other Electron-style app that exists. The comparison I made was a fair one.

> To compare it to npm development, you would need to factor in the total footprint of every package that you had to install to compile Firefox in 2011.

No, you wouldn't. That's a completely off-the-wall comparison.

How many lines of application code (business logic written in JS including transitive NPM dependencies before minification) go into a typical Electron app in 2021? Into a medium sized web app? Is the heft-to-strength ratio (smaller is better) less than that of Firefox 4, about the same, or ⋙?

After I compile my Rust or C app (and pull all attendant libraries to make that possible, spread all over my system) I’ve downloaded about 500MB of code. The resultant binary is 10MB.

If I do the same thing with my JS app, I still download a bunch of libraries, but puts them all in node_modules. That’s also about 500MB. The resulting compiled/built code is around 2MB.

I dunno, seems roughly the same.

It sounds like you're using the React Hello World example to respond to the comparison to Firefox. They're separate points which stand on their own.

With respect to the package size issue, the 500MB-to-2MB observation does not bode well for the claim of irreducibility.

> The reason packages are so big is the complexity for an interesting app is irreducible.

This is absolutely, demonstrably false. Can you really claim that you use 100% of the features provided by all of the dependencies you pull in? If not, you are introducing unnecessary complexity to your code.

That doesn't mean that this is necessarily a bad thing, or that we should never ever introduce incidental complexity—we'd never get anything done if that was the case. My point is simply that there exists a spectrum that goes from "write everything from scratch" on one end all the way to "always use third-party code wherever possible" on the other. It's up to you to make the tradeoff of which libraries are worth pulling in for a given project, but when you use third-party code, you inevitably introduce some amount of complexity that has nothing to do with your app and doesn't need to be there.

I don't use 100% of the features I pull in. But I also don't use 100% of the features of libc or gtk if I'm building a GUI app in C.

I have 35 MB of node_modules, but after webpack walks the module hierarchy and tree-shakes out all module exports that aren't reachable, I'm left with a couple hundred kilobytes of code in the final product.

> But I also don't use 100% of the features of libc or gtk if I'm building a GUI app in C.

That’s exactly my point. This is a tradeoff that’s inherent to software development and has nothing to do with the web or Node or NPM. You could just as well decide to write your desktop app with a much smaller GUI library, or even write your own minimal one, if the tradeoff is worth it to reduce complexity. (Example: you’re writing an app for an embedded device with very limited resources that won’t be able to handle GTK.)

> browser-provided abstractions are trash for writing a web app

This is the key.

If browsers would improve here we wouldn't need half of the dependencies that we use now. It took nearly a decade to get from moment.js to some proper usable native functions for example.

Besides that we _really_ need to solve the issue of outdated browsers. Because even when those native APIs exist we'll need fallbacks and polyfills and lots of devs will opt for a non-standard option (for various reasons).

The web is still a document platform with some interactivity bolted on top, I love it but it's a fucking mess.

Without more information this mindset is stuck where the web platform was maybe a decade or more ago. Roughly a dog or cat lifetime. Consider the list APIs at https://developer.mozilla.org/en-US/docs/Web/API I'd be curious to know if anyone active on HN could actually say they have proficiency with the entire list. Professionally speaking I wouldn't call that a mess. I'd call it a largely unused and unexplored opportunity.
Somehow people managed to develop useful software before NPM and node and so on, without having thousands of very small dependencies. Maybe it's because the stuff built in to Javascript is nearly useless? And the older languages had a standard library that included most of the useful stuff you'd need to build something?
Ruby, Python, Go, Rust, etc all have this exact same problem; it's not unique to NPM.

JS has a culture of using lots of small, composable modules that do one thing well rather than large, monolithic frameworks, but that's only an aggravating factor; it's not the root of the problem.

The root problem is no stdlib and a language design riddled with edge case foot guns that are easy to miss in what should be trivial code.
Again, that's only an aggravating factor, not the root cause. Supply chain attacks can happen in literally any language that has a package manager.

Here's a similar issue that occurred with Python's PIP just this year: https://portswigger.net/daily-swig/dependency-confusion-atta...

They do not, they have capable and trusted standard libraries and it’s quite possible to build a web app in those other languages without any external dependencies whatsoever.

JS and its culture of small dependencies that do one thing but import 100 other things to do that thing is the root of the problem here.

The GNU software ecosystem can be described as "culture of small dependencies that do one thing but import 100 other things to do that thing..." Installing, say, GIMP for the first time using `apt-get install` pulls in about 50 packages and many, many megabytes in total.

So the issue is probably something other than using bazaar-style code design. I think as other people in the thread have noted, the distros have centralized, managed, and curated package libraries that get periodically version "check-pointed" and this is not how npm works.

I may have my answer to the original thought I floated: the way this problem has been solved successfully is to centralize responsibility for oversight instead of distributing it.

> that do one thing well

And sometimes even something the language already does, but the author didn’t know.

Part of that was that we didn't make major changes to how we did things every other project back then. If we needed to do X and that wasn't built in to the language or standard library we were using we would either write our own X library or we could take the time to carefully evaluate the available third party X libraries and pick a high quality one to use. We could justify spending the time on that because we knew we'd be taking care of not just our immediate X needs but also the X needs for our next few years worth of projects.
BTW, you can build a lot of interesting things with jQuery alone.
That's going to be incompatible with writing interesting software on the web

Lots of people are writing interesting web software without these problems - the website you’re currently posting on is one example. So I completely disagree with this statement and think you need to examine your assumptions.

There is life outside npm.

"Interesting" was a bad choice for specificity here on my part. By the definition I mean, HN isn't interesting... It's got interesting content, but the UI is a dirt-simple server-side-generated web form.

OpenStreetMap is "interesting." Docs and Sheets are "interesting." Autodesk Fusion 360 is "interesting." Facebook is "interesting." Cloud service monitoring graph builders are "interesting." The Scratch in-browser graphical coding tool is "interesting." Sites that are pushing the edge of what the browser technology is capable of are "interesting."

None of the sites you mention above would require npm to build.

At some stage after you've seen enough 'interesting' dependencies changing the world around your app as you write it you'll realise that boring is good for most of the tech you depend on - the more boring the better, and the fewer dependencies the better.

You might be surprised how small a team it took to produce microsoft office 2000 (last good version), or windows nt kernel, or WhatsApp.

One need not be a big player to write good code without 10000 dependencies

I have to think there's a lot of YAGNI going on, dependencies that are included to be a better version of native functionality. A faster JSON parser, say, with I dunno, 20 dependencies (a count which may further extend within those deps) for something where slow JSON parsing has not yet become an issue. I think there's a lot of "academic" inclusions out there like this.
My experience working on tens of front end projects is the complete opposite. Nobody is adding dependencies just for the fun of it, or because you might need it in a year. You add a dependency because you need some functionality and there is no time/budget to re-do it in house - not to mention that if it's a well-supported library with, for example, hundreds of thousands of users, it's unlikely you could even make it better.
> there is no time/budget to re-do it in house

What are the actual time cost savings when you take the total costs into consideration?[1][2] What would it look like if you didn't implement an app by stringing together dozens/hundreds/thousands of third-party modules implemented bottom-up, but instead took control of the whole thing top-down?[3]

1. https://jvns.ca/blog/2021/11/15/esbuild-vue/

2. https://news.ycombinator.com/item?id=24495646

3. https://www.teamten.com/lawrence/programming/write-code-top-...

I agree that using node to write browser client code requires more configuration of the compilation environment than I would like (especially since I have to configure both node and some kind of packer to convert all of my es6 module dependencies into one flat pack JavaScript file).

That's a small up-front one-time cost relative to writing Redux from scratch. And before anyone asks... Yes, our use case is complex enough to justify a local state storage solution based on immutable state curated via actions and reducers. Just as our rendering use case is complex enough to justify React.

Then you are shit out of luck and vulnerable to supply-chain attacks. Good luck with that.
Well, that's what I'm wondering. GNU/Linux distros like Debian and Ubuntu don't seem to suffer supply chain attacks, but it's not entirely clear to me why. Is it because the distros are more carefully curated, and the infrastructure for extending them older so it has had more time to wrestle security concerns to the ground?

Or is it, disquietingly, the possibility that they are completely vulnerable to this sort of attack and either nobody has noticed there compromised or attackers haven't decided that compromising a major desktop Linux distro is worth the time?

https://www.zdnet.com/article/open-source-software-how-many-...

Distributions like Debian are _highly_ aware of supply chain attacks. That's one of the key reasons for projects like Reproducible Builds [0] and rekor [1] existing.

So yes, distributions are carefully curated, with a large team of experts vetting the system in a huge number of ways, and are always looking to improve upon them. Because attackers are actively attempting to compromise major distributions.

[0] https://wiki.debian.org/ReproducibleBuilds

[1] https://lwn.net/Articles/859965/