Hacker News new | ask | show | jobs
by asien 2459 days ago
I’m fascinated by the fact that while node has become a new standard in the industry , and the project is receiving lots of supports from all sorts of companies ( IBM , Microsoft etc...) absolutely no discussion has been opened about how much at risk the JavaScript ecosystem actually is with « npm » and it’s weekly dramas

Not a month pass without something going wrong inside of inc, millions of developers are dependant on it but nothing seem to worry people...

6 comments

There has been. CJ Silverio gave a talk on that topic at JSConf this year: https://www.youtube.com/watch?v=MO8hZlgK5zc

See also Entropic as a possible alternative to NPM: https://github.com/entropic-dev/entropic

For context: Entropic was started by several former npm Inc core developers including CJ who left the company after the CEO change was announced internally. CJ herself is the former CTO of npm Inc.

Basically most of the top developers have left the company, others were laid off. It seems to be only a matter of time until none of the people working on npm a year ago remain.

i doubt entropic will become sth significant. it already lost the momentum it seems.
Yes. The Node ecosystem is a huge liability just waiting to happen. Any organization that depends on NPM is making a huge gamble. You can do a lot to mitigate this (private NPM repo, locks) but the reality is that the dependency chains are dangerous. Is someone in an organization going to audit all of those dependencies? Especially under the circumstances where they've been declared without an explicit version (>, >=, <, <=, ~, ^, 1.2.x, *).
As someone who has no insight into the Node/NPM/JS world, how is this different from Python's PyPi, which I would think suffers the same issue?
Mechanically, I believe the main difference is that NPM corresponds to PyPi + pip; it's both registry and repository. PyPi is Python's official package repository and pip's default source, but it's not pip's only possible source. If someone pulls code off PyPi, pip can still link to it elsewhere.

When the left-pad debacle [1] broke major packages, the triggering event was that NPM-the-registry took a module name away from a developer and gave it to a company (which held it as a trademark). He got mad and took all his code off NPM-the-repository, including left-pad. To settle the chaos, NPM had to restore the deleted code against the developer's wishes - even though his code was still available on GitHub.

Realistically, though, that would still cause some chaos and it doesn't seem to be the key difference. The more important differences are legal and practical.

npm, inc. is a private company, while PyPi (via Warehouse) and pip are both open-source and donation funded. Even if npm doesn't get up to anything malicious, they depend on keeping their registry and repository unified, and they're more likely to attract and buckle under trademark suits than PyPi.

Even more importantly, npm modules are tiny. PyPi packages and Ruby gems may be single-purpose, but they usually do something which isn't completely trivial, and common functions like math libraries are built into larger packages. npm left-pad was 17 lines of string-padding code that almost anyone could write. Other modules are even sillier; isArray has millions of weekly downloads for what is effectively a single line of code. Blank npm templates have tens of thousands of files loaded even before you start coding.[2] So fundamentally, a big part of the issue is just that node projects tend to pull in 10x or 100x more dependencies than most other projects.

[1] https://www.theregister.co.uk/2016/03/23/npm_left_pad_chaos/

[2] https://www.davidhaney.io/npm-left-pad-have-we-forgotten-how...

I think the problem is that NodeJS has a weird culture where a separate package is created for every little thing, then tons of other packages start depending on it.

I don't live in NodeJS world but even I heard about package Left-Pad, that all it does is padding string from the left side. The author decided to pull it out from the repo rendering tons of other packages nonoperational[1].

[1] https://www.theregister.co.uk/2016/03/23/npm_left_pad_chaos/

It wouldn’t surprise me to learn that this is ultimately due to how little the JS standard library actually does.
It is.

Though ES2017 added a number of string operations, including String.prototype.padStart.

The number of packages you need to audit for what would otherwise seem to be a trivial feature is exponentially larger in JS world than in Python. A good example of that is the left-pad debacle, wherein a package that left-pads a string was taken down, causing other packages - notably React - to fail to be installed because of either direct or transitive dependencies.

In the Python world, it is indeed likely that unpublishing requests will cause issues, but the number of dependencies you'd need to audit/vendor is _much_ smaller for a typical python app than it is for a typical nodejs app, so your "attack surface" is also comparatively much smaller.

Python has a standard library that is not horribly anemic, and, as far as I know, there isn't anywhere near the same propensity to atomize packages. NPM is infamous for having stupidly small packages like isEven or leftPad, and monstrous transitive dependency trees.
A large Python package might have fewer than 10 dependencies, while a typical JS package easily has hundreds if not thousands of dependencies. Its much easier to manage your dependencies when you can count them on your fingers.
Difference in approach to libraries and the standard library.

NPM/Node is very much a "we only provide the bare minimum for a language, everything else must be implemented on your own." The JavaScript stdlib is very small (not that weird when you consider it's originally a language to do stuff with in your browser).

The result is that a lot of "simple" functionality that most languages would put in the stdlib (left-pad is the most infamous example) has to be reimplemented by library developers. Now because programmers are for the most part not interested in copying the same code over and over, this means that these simple functionalities end up on the npm, which are then used in somewhat bigger libraries, right up until you essentially create a massive dependency chain for each major library, since the dependencies for that library rely on other dependencies and so on so forth.

This sounds interesting in theory, but in reality this almost always means that if one thing in this chain breaks (for example a "simple" library introduces a breaking change but doesn't properly adhere semantic, since nobody enforces semantic on the npm although it's recommended), essentially the entire chain is broken and the top level library stops functioning as well.

And then you end up with packages such as left-pad, which provide simple functionality that is almost universally needed for almost all major libraries. Now the last thing you want to have happen here is that the maintainer either removes or breaks the package, since this essentially results into a dependency hell cascade as suddenly several millions of packages are broken.

Python on the other hand has probably one of the biggest standard libraries I've seen in a programming language, and the difference is outstanding. Most PyPi libraries are moreso focused on adding specialized features or simplifying otherwise lower level libraries into more higher level ones (ie. requests is really nothing other than a really good wrapper around urllib). There's very few "simple functionality" libraries for PyPi, since most of this functionality is already in the standard library. Usually if a library that is "simple functionality" isn't in the standard library, it's because it changes too often (standard library is mainly for unchanging code. eg, requests) or is still somewhat specialized (ie. sqlalchemy or a couple of validator packages I use).

There's pro's to Nodes approach (ie. you're generally not locked down to a single approach), but generally Pythons approach on library management is better in my opinion.

The people that would care the most are mostly immune to npm issues.

Privately hosted npm repos, checked-in to version control systems, etc.

There are plenty of solutions if you don't like npm. However, they carry the traditional costs of managing dependencies; if you want super super super easy above all else, that's what npm does.

I don't think that's accurate. Most devs and thus companies treat NPM as a utility. Maybe very large companies would not feel it, but if NPM went down tomorrow there would be utter chaos on the internet.
There would be chaos, just not for FAANG.
Someone upthread asked "why is NPM different from PyPi/pip in this?"

There are lots of practical answers - PyPi is open source, Python packages aren't so fragmented, and so on. But honestly, a huge part of the difference is that PyPi has sponsors like PyPi and AWS using its baseline implementation. NPM's private repository system means the public system just doesn't have that kind of pressure on it.

I would be deeply surprised if AWS teams use public pypi. Much more reasonable would be to mirror public packages they use internally.

What if a minor version change contains a relicensing of the library, for instance?

Good point. Presumably they're fixing versions, even companies on public registries should do that to avoid re-licensing issues, but it'd be an unreasonable legal & security risk.

I guess my broader thought was that PyPi is a more reliable free offering than NPM because it's not focused on a 'premium' version for the biggest users. But that's different than AWS - presumably they're sponsoring it in a broader "making development accessible is good for AWS" sense.

Honest question, is it that much better in other communities? In particular, it's there anything inherent to npm that's problematic or is it just that a huge community with a Unix mindset (small packages that does one thing well) is problematic?
Lack of a large enough standard library is a big differenciating factor, as it makes you very dependent on third party libraries.

Even if you avoid it by creating your own utils, chances are that the creators of the large packages you use (like a database manager or a rest framework) will depend directly or indirectly of those third party tools.

No other package manager is run by a VC backed startup. That's inherently problematic and means the registry is in the hands of a company that could be killed off or sold at any moment because it needs to make massive profits (without having any way to truly make a profit in the first place) to continue to exist.
It was a problem in otherwise very successful perl's CPAN. I don't recall any "dramas", but because of a huge number of dependencies installing Catalyst had a low change of going right the first time. But! Because of the test everything culture and CPAN testers effort broken modules were very rare. So after you managed to set up your system and had a long walk while tests would run it was guaranteed to work. More or less.
Any community dependent on a particular repo is at risk. I'd argue that RubyGems is just as risky as NPM, for example.

The problem is that there's no good fix or alternative, it's hard to avoid a single point of failure.

There is actually a lot of discussion about this. There is also a lot of work going into different approaches to solve this.

For example Frea. It's a conservative approach for incrementally federating the package registry and you can start using it today. It's ready for all your traffic.

website: https://freajs.com

how it works: https://docs.google.com/presentation/d/16pxrYfpxxKRzhpMM0zZV...

Why was this downvoted?
People care. Developers have limited registry options due to lock-in. Creating effective security processes across a massive ecosystem of open source developers is a difficult problem. Registries can't easily create security practices that fit into a heterogenous pool of oss governance and development models. Especially when implementing more rigorous security has the potential to diminish their network effects and developer productivity. Curious to see how npm, GitHub Package Manager, and others address these issues.