Hacker News new | ask | show | jobs
by esens 1804 days ago
I found that much of the underlying cause is those mass reporting regex denial of services as being high severity bugs.

So many people are reporting these in tons of different projects: https://github.com/search?q=regex+denial+of+service&type=iss...

Anyhow it is just annoying and they broke NPM Audit based on these reports.

It is good to fix all possible bugs, but many of these are not anywhere close to the level of bad that the reports are making them to be.

But maybe this is needed to just get rid of these issues in genera? So a wave of regex vulnerability reports and then we build this type of checking into prettier or similar and we do not have these in the future?

EDIT: It appears there as a project that found 100s of CVE reported Regex vulnerabilities in npm projects -- this is maybe one of the sources of mass reports. See the bottom of this resume: https://yetingli.github.io

9 comments

I'm a maintainer of a few of the larger packages on npm. This is generally pretty accurate. Snyk Security seems only to find regex DoS bugs and I'm a bit disappointed in them being classified as high severity, and they're the only ones submitting reports right now.

They seem pretty adamant on filing CVEs despite what the owner says (It's normally fine but these DoS vulns require very large input to be handed into the function by untrusted sources, which given how these libraries work isn't going to be very common).

Now, I have people yelling at me about dependent packages not being updated because they don't understand version ranges, or because some audit states they are high vulns, or whatever.

Super broken, everything related to npm's package lock stuff is broken by design. I've been saying it for years now and it seems people still cling to blindly trusting what corporations say.

> Super broken, everything related to npm's package lock stuff is broken by design. I've been saying it for years now and it seems people still cling to blindly trusting what corporations say.

Because this isn't true. Just because you're experience this effect (which blows), doesn't mean the tool and related tooling are somehow broken. These Regex issues should be fixed, libraries should update to safe versions, things should advance and any incentive we have we should use to make this happen.

I think it helps to inform the developer about possible issues, but I think in most cases depending on the software this is plainly not relevant and can be ignored. I wouldn't classify it has high severity. Also, It might just not be trivial to develop a regex library that cannot be DDOSed or the mechanism that was declared a vulnerability.

Might be nice to be able to tag libraries that should be ignored in audits. Perhaps there is such a function, not really a NPM expert. But if your projects has too many of these "high severity" problems, you probably stop doing them.

Still, I think the availability of such audits from the package manager is quite neat. As an embedded dev I think these are certainly luxury problems.

> Just because you're experience this effect (which blows), doesn't mean the tool and related tooling are somehow broken.

I've been in the node scene since 0.10. That's around 10 years. My packages have billions of downloads annually. My viewpoint here carries the weight of hours of debug time and frustration and confused users of my code, as well as meeting and knowing the npm staff at the time quite personally, and knowing under which circumstances package lock files were implemented.

They are broken.

> These Regex issues should be fixed

They do, pretty much immediately after they're reported.

> libraries should update to safe versions

I check all the version ranges of dependent libraries when I push a patch with vuln fixes. They get pulled just fine without needing to update every single package. This is what version ranges are for.

> things should advance

Yes but this is nebulous and vague and aside from the point.

> and any incentive we have we should use to make this happen.

I don't see where the disagreement is. This is exactly what happens all the time, nothing is the problem here. I don't get your point.

---

Package lock files were designed in a few short days and pushed out prematurely without much review by a single Npm employee (at the time) since they promised it for the v5 release. They were on a time crunch because they were trying to keep with Node.js's next major release timeline, which operates independently of npm's (at least, that's how it was conveyed to me).

So this change got pushed out, had an absolute mountain of bugs that took ages to fix (e.g. at one point adding a new dependency would delete your entire node_modules folder), and promised added security when in reality they do nothing of the sort.

Instead, they cause subtle caching-related bugs, they add an artifact to source control (which is always code smell in my book), crap up diffs/PRs, cause headaches across platforms, and do very little to help... anything, really.

They're super, super broken by design. Yet npm tells you you need them ("please commit this to your repository") and refuses to do basic security things without them (npm audit).

So why were they added? IIRC it was because the version resolution was a massive strain on npm's servers, so lockfiles removed the need to fetch tons of version information each time you added another dependency.

Oh, and don't even begin to whine about them on Twitter (at the time), lest you be yelled at by the implementor for being ignorant or something.

It was a shit show. They add absolutely nothing to the industry.

Hope do you make your builds hermetic and reproducible without package locks?
Your builds are not reproducible with anything related to npm. Neither npm nor any bundler that I'm aware of guarantees that.

Unless we're not talking about the same reproducibility property. Also I don't know what "hermetic" means in this context but I doubt it's anything that npm solves correctly.

There is a way, but it's troublesome. Create a docker image with installed node modules. Save it, and from then onwards you have frozen node modules. If you need a new dependency/updated version you need to create a new image and npm i.
Ha ha, our builds are not “hermetic and reproducible” with package locks. Why? Caching.
How does caching effect this? are you’re devs / build processes not doing clean installs?
> I've been in the node scene since 0.10. That's around 10 years. My packages have billions of downloads annually. My viewpoint here carries the weight of hours of debug time and frustration and confused users of my code, as well as meeting and knowing the npm staff at the time quite personally

So when you say "npm staff at the time", do you mean at the time of node 0.10?

> and knowing under which circumstances package lock files were implemented.

> Package lock files were designed in a few short days and pushed out prematurely without much review by a single Npm employee (at the time)

The amusing thing about your comment here is the parts which are accidentally correct.

`package-lock.json` files use the same file format as `npm-shrinkwrap.json` files. Always have, although of course the format of this file has changed significantly over the years, most dramatically with npm 7.

The "design" of the shrinkwrap/package-lock file was done rather quickly, since it was initially just a JSON dump of (most of) the data structure that npm was already using for dependency tree building. However, as far as I know, the days were the standard length of 24 hours, so while that may be "short", certainly shorter than I'd often prefer, they were (as far as I know) no shorter than any other days.

This was indeed shipped without any review by even a single "npm employee", which should not surprising, as "npm" was not at that time a legal entity capable of hiring employees. The initial work was done by Dave Pacheco, and reviewed by npm's author (at that time its sole committer and entire development staff), both of whom were Joyent employees at the time.

The use of a shrinkwrap as a non-published normal-use way to snapshot the tree at build time and produce reproducible builds across machines and time was not implemented by default until npm v5, but there wasn't really much to rush, on that particular feature. You could argue that npm 5 itself was rushed, and that's probably a fair claim, since there was some urgency to ship it along with node version 8, so as not to wait a year or more to go out with node v10.

> So this change got pushed out, had an absolute mountain of bugs that took ages to fix...

Idk, I think calling it a "mountain" is relative, actually ;)

> They're super, super broken by design.

I know you're using this phrase "broken by design" in the same sense as the author of the OP means it, but... has language just changed on me here, and I didn't notice?

As I've always heard the term used, something is "broken by design" when the actual intent is for a system to fail in some way, to achieve some goal. For example, a legislative or administrative process that is intentionally slow-moving and unable to accomplish its goals in a reasonable time frame, with the hope that this leaves room for independent innovation. Or a product that requires some minor upgrade or repair to continue working, so that the seller can keep tabs on their customers more easily. That kind of thing.

I think what you mean is not that it's "broken by design", but rather it's "a broken design". Unless this is like "begging the question", and I should just accept that I'm gradually coming to speak a language of the past, while the future moves on. It's certainly not intended to cause problems, as far as I'm aware.

If you really do mean "broken by design" (in the sense of a tail light that goes out after 50k miles so that you will visit the dealership and they can sell you more stuff), I'm super curious what you think npm gets out of it.

> Yet npm tells you you need them ("please commit this to your repository") and refuses to do basic security things without them (npm audit).

As of npm v7, there's no longer any practical reason why it can only audit the lockfile, rather than the actual tree on disk. Just haven't gotten around to implementing that functionality. If you want it changed, I suggest posting an issue https://github.com/npm/cli/issues. There's some question as to whether to prioritize the virtual tree or the actual tree, since prioritizing the actual tree would be a breaking change, but no reason why it can't fall back to that if there's no lockfile present.

But even approaching build reproducibility is impossible without lockfiles. If a new version of a transitive dependency is published between my install and yours, we'll get different package trees. If we both install from the same lockfile, we'll get the same package tree. (Not necessarily the same bytes on disk, since install scripts can change things, but at least we'll fetch the same bytes, or the build will fail.)

> So why were they added? IIRC it was because the version resolution was a massive strain on npm's servers, so lockfiles removed the need to fetch tons of version information each time you added another dependency.

You do not recall correctly, sorry. (Or maybe you correctly recall an incorrect explanation?) The answer is reproducible builds. Using a lockfile does reduce network utilization in builds, but not very significantly.

> Oh, and don't even begin to whine about them on Twitter (at the time), lest you be yelled at by the implementor for being ignorant or something.

I hope my tone is civil and playful enough in this message to not consider my response "yelling".

Read as: tough, that company says you need to do the work so I demand you do the work.
A regex "denial of service" "vulnerability" could be important, if it shows up in code that processes untrusted input from end users.

But NPM Audit has no idea of context-- a "critical" bug in `browserlist`, which, in this context, is never used outside the development process and never takes input outside of what's in my package.json, gets the same prominence (or more so, since it's early in alphabetical order) as a "critical" bug in Express, potentially allowing my server to be compromised.

I'm not really sure what the solution is here; NPM's just a package manager and doesn't know how you're using a given package. A simple heuristic distinguishing development dependencies and runtime dependencies in NPM Audit might be a start, but that doesn't help with situations like create-react-app's react-scripts where everything, runtime or dev dependency, is a transitive dependency of one package declared as a runtime dependency.

Agreed!

A “Critical” bug in a dev context should mean something very different from a “Critical” bug in a prod context. A “Critical” devDependency bug should be either a direct threat to the developer’s context, either by infecting the dev machine or by injecting a supply-chain problem, worming it’s way into downstream contexts.

npm audit is just not granular OR careful enough to address these issues appropriately.

Agree 100%
Would be nice if package.json had a flag to indicate the runtime would be either Node.js or a browser. So many of these "bugs" have no bearing in a browser context.
The package.json should be able to actively ignore vulnerability id's. As id's disappear with audits the npm audit could just remove those, eg a "prune"
IMHO one solution would be to categorize vulnerabilities separately for prod dependencies and dev dependencies, and bubble that categorization up.

For example, a RegEx DDoS vulnerability in Express would show up as high severity, while the same would not show in the bundler you use, or any package that your bundler has in its dependency tree.

Other developers have no idea of context either. Unless you have a way of enforcing that certain code is never exposed to user input (and I agree that a build-time-only dependency does solve that), sooner or later it will be.

Accepting regexes from user input is a really insidious class of bug that can go undetected for years. I've seen real outages caused by it, so it's absolutely worth doing something proactive about.

True story, the npm registry was once taken down (not maliciously, just by accident) by a ReDOS in node-semver. That was extra fun to debug because the failure happened inside of CouchDB.
> A regex "denial of service" "vulnerability" could be important, if it shows up in code that processes untrusted input from end users.

But in this context what's the end result? Chrome locking up on the end user's (attacker's) machine? Again, an "attacker" doesn't have access to the source code for distribution. By inputting bad regexp data they're only DOSin themselves, no?

Could be in a service on a server, which in this case a RegEx DOS could lock the server for all users.
Or if a JS frontend takes in input that comes from other users-- something like forum post titles or content.

That's "just" a browser freeze for end users, but still a potential DOS vulnerability if it's in the application's critical path.

Right, but that's why I wrote "context," and seems to be the primary complaint in this article.
StackOverflow and Cloudflare have both self-DoSed themselves with such flaws, causing downtime.
This kind of nonsense really goes back to the broken CVE process. https://opensourcesecurity.io/2021/03/30/its-time-to-fix-cve...

Linux kernel maintainer Greg Kroah-Hartman has a similar opinion. https://github.com/gregkh/presentation-cve-is-dead/blob/mast...

Edit: LWN mention https://lwn.net/Articles/801157/

The view of SQLite developers on CVEs is also dim: https://www.sqlite.org/cves.html
Beautifully succinct. This quote: "Grey-hat hackers are rewarded based on the number and severity of CVEs that they write. This results in a proliferation of CVEs that have minor impact, or no impact at all, but which make exaggerated impact claims." Alignment of incentives is messed up. Goodhart-Strathern's and Campbell's laws apply.
Sounds like academic research publications. Sure, that will totally be a key step toward cancer therapy or better biofuels (realistically, the PI gets his jollies by shoving aldehyde groups onto random molecules)
Oh, you mean like the guys who tried to inject vulnerabilities into the linux kernel and got their entire university on Greg Kroah-Hartman's shit list? https://news.ycombinator.com/item?id=26887670
That sounds better than a PI that gets their jollies by shoving nitro groups into unsuspecting organic molecules.

Although I still have a deep admiration for the Klapötke "Energetic Materials" group at Munich Uni.

Maybe then writing and submitting a CVE should cost some money that’s payed back together with the reward if the vulnerability is found to be „reasonable“ upon review?
I'm always suspicious of just throwing money at a problem, particularly in things like open source where money isn't always the motivator and can often be a corrupting influence. In some cases this will reduce the ability of genuinely well-intentioned people to participate, simply because they don't have the money up front, and for well-funded organizations the money would have to be quite a lot.

I'd like to ask what, other than money directly motivates people? Is it prestige? A line on their resume? A requirement for a bootcamp class? In addition, we should re-evaluate the difficulty of submitting a CVE. Is it too easy? The story about a mass of "hey your regex parser could choke on this weird expression[1]" reports suggest that perhaps so. What can we do to make it so that CVEs and equivalents are truly meaningful? Also, just the fact that CVE reports are given a great deal of respect could be the problem, although at this point that seems to be self-correcting.

[1] Some classes of regex parsers are known to be vulnerable by nature, those that do backtracking for example, because their worst-case runtime grows exponentially and can run in unbounded time. This has been known since at least 2009. There are other implementations with better worst-case runtimes, but worse performance in typical cases. The fact that it's trivially easy to look at a regex parser to see if it does backtracking and construct an "evil" expression that breaks it means it's trivially easy to file a DOS report against any such parser.

AFAIK MITRE has a process for an organization to register as vendor, and then it would accept CVEs for their products only from the vendor, not from random people. Of course this has an opposite failure mode that may have unscrupulous vendors hide issues or just be lazy in issuing CVEs for existing bugs, but it eliminates the problem of random people issuing a ton of CVEs for non-issue bugs.
I'm pretty sure CVEs and the like came about because vendors were choosing to hide or deny security vulnerabilities. Vulnerability disclosure policies are a whole different kettle of worms.
Maybe, but you risk swinging too far in the opposite direction into under-reporting of vulnerabilities.
Believe me, 90% of people who find bugs for a living are perfectly content with keeping them to themselves and/or selling them privately.
I had a researcher contact me about a "vuln" in an OSS effort of mine once. The vuln made no sense w/ how the tool was used, but they published and I earned a CVE scarlet letter nonetheless. I finally "fixed" it, but IMHO, nothing was ever broken or vulnerable.
I wouldn't call a CVE a scarlet letter. Given the current state of software engineering, it's more like "my project is valuable enough to be used by someone that cares about security". You fixed it, one less bug to worry about. No doubt there are many less popular products with many worse vulnerabilities that don't have a CVE.

Even OpenBSD had to change their tagline to "Only two remote holes in the default install, in a heck of a long time!" (from "Five years without a remote hole in the default install!") Still a pretty impressive track record.

> You fixed it, one less bug to worry about.

Those "bugs" can be features though - or the work involved to fix the bug meant that high-impact feature work - or other bugfixes, had to be postponed or even cancelled.

Our SaaS frequently gets security "researchers" (read: people running online scanners) submitting emails through our contact-form informing us about click-jacking attacks on our login-page - the problem for us is that we have a lot of second-party and third-party integrations on unbounded origins that offer access to our application, and by extension our login-screen through an <iframe> on their own origin, which is sometimes even an on-prem LAN web-server accessed through embedded devices where we can't use popups to do it properly - let alone switch to a more robust OIDC system - so there is no easy solution that makes the "I ran a tool, gimme $100" people go-away without causing a much bigger problem to now exist.

> there is no easy solution that makes the "I ran a tool, gimme $100" people go-away

I use the "mark as spam" button :)

> so there is no easy solution that makes the "I ran a tool, gimme $100" people go-away without causing a much bigger problem to now exist.

Maybe consider setting up a free-tier HackerOne bounty program? I think they triage to some degree on your behalf.

In the free tier? Triage is a paid service.
Here's the video of the talk that goes with Greg Kroah-Hartman's slides: https://www.youtube.com/watch?v=HeeoTE9jLjM
CVEs can be used to try and extort OSS maintainers into expensive consulting gigs: https://www.patreon.com/posts/unfixed-security-21250652
I wouldn't take Greg's opinions on security too seriously.

Spender has a much more nuanced, informed view. I think it covers the issues of the CVE process well, but doesn't make the same mistakes that Greg does.

https://www.grsecurity.com/reports_of_cves_death_greatly_exa...

The more I work with parsing, parser combinators and writing grammars for little languages, the less often I find myself using or wanting to use any regex at all. When I do, I always feel like there should be a better way, perhaps a type safe way of accessing the info I need and so on. It feels "Ugh, there should be a better way to do this." Especially in JavaScript, regexes blow in comparison to languages with named matching groups and all that. In JS regex really feels horrible, even more cryptic than in other languages.

I think regexes are often used as a quick and dirty solution to problems, which should be solved differently. But once the regex "works" and is in place, others begin to rely on that output. Over time cruft begins to accumulate and the regex is forgotten or at least never replaced with anything more appropriate.

> The more I work with parsing, parser combinators and writing grammars for little languages, the less often I find myself using or wanting to use any regex at all.

Surprise: The most common parser combinator libraries do backtracking. That's exactly the problem. Any solution as widely used (if not overused) as regular expressions ends up exposing a number of dark corners where the design isn't as clean and tight as you would want it. There are lots of better ways, but most of them are specialized and are totally unsuited for significant areas where people need something.

That said: yes I've used LR(1) parsing (not LALR) using a library that uses parser combinators with a good interface, and it's more powerful than regex and worth it for the right usecase.

> Especially in JavaScript, regexes blow in comparison to languages with named matching groups and all that

Good news (well, probably). JavaScript (ECMAScript 2018+) now supports named matching groups.

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guid...

Add prototype pollution and you've covered 90% of all "vulnerabilities"
We got bit by this last week; our scans were suddenly all red, and nobody could deploy to production. We had to write an analysis of why this wasn't actually dangerous to us in order to get security to suppress the findings.
Hmm. This actually sounds really plausible. I wonder if there's any way to check that.
Having 20 supposeddly high risk issues about possible DoS when all come from a build dev-dependency is just totally useless. If only I could add whitelists, then it would be bearable. Like "I don't care about such kind of issue in a dev dependency".
Isn't this an area where gamification and machine learning could actually be useful, if applied carefully?

If people are competing for CVEs, then why not work out a way to better differentiate them them through scoring and make this visible. The goal would be for attention to shift to the scoring instead of only a CVE count. Offer both views of the world, so tools could still fall back on the problematic listings they get today.

Apply machine learning to classify CVEs based on the reputation of the reporter, blast radius, or other criteria. Use that to drive community review and scoring.

I would not see this a panacea because it brings a lot of challenges (a la StackOverflow), but it would be much better than what we have today.

We're kind of already doing scoring in that CVEs are usually graded on severity, but researchers are motivated to inflate the severity of CVEs they find. So the question you'd need to tackle is how does one apply a universal standard to measure the real impact of a CVE?

I suspect it's an impossible challenge, but I only dip into this domain casually so maybe someone has better ideas.

I'm not making the claim it's a universal standard, but there are likely indications that some researchers are a different pedigree from others. A researcher reporting the same kind of low grade vulnerability probably shouldn't carry the same reputation score as other researchers.

I don't think there is a perfect way to do this, and I don't think there is an absolute standard that can be applied. It will be unfair to some people, but the system should have options for resolution when there are egregious mistakes. I'm not making the claim either, that the views of the data you are interested in are the ones I might be interested in. A good system would provide some different levels which itself is an incentive towards better research that would break through.

I'm more inclined to think the better solution would be to stop issuing CVEs for trivial "exploits" like Regex DOS, unless there's actual demonstrated uses of the exploit.