Hacker News new | ask | show | jobs
by gameman144 276 days ago
> It's not feasible for me to audit every single one of my dependencies, and every one of my dependencies' dependencies

I think this is a good argument for reducing your dependency count as much as possible, and keeping them to well-known and trustworthy (security-wise) creators.

"Not-invented-here" syndrome is counterproductive if you can trust all authors, but in an uncontrolled or unaudited ecosystem it's actually pretty sensible.

8 comments

Have we all forgotten the left-pad incident?

This is an eco system that has taken code reuse to the (unreasonable) extreme.

When JS was becoming popular, I’m pretty sure every dev cocked an eyebrow at the dependency system and wondered how it’d be attacked.

> This is an eco system that has taken code reuse to the (unreasonable) extreme.

Not even that actually. Actually the wheel is reinvented over and over again in this exact ecosystem. Many packages are low quality, and not even suitable to be reused much.

The perfect storm of on the one side junior developers who are afraid of writing even trivial code and are glad if there's a package implementing functionality that can be done in a one-liner, and on the other side (often junior) developers who want to prove themselves and think the best way to do that is to publish a successful npm package
The blessing and curse of frontend development is that there basically isn't a barrier to entry given that you can make some basic CSS/JS/HTML and have your browser render it immediately.

There's also the flavor of frontend developer that came from the backend and sneers at actually having to learn frontend because "it's not real development"

Ha, that's a funny attitude. And here I was thinking, that mostly doing backend work, I rather make the best out of the situation, if I have to do frontend dev, and try to do "real development" by writing trivial things myself, instead of worsening the situation by gluing together mountains of bloat.
> There's also the flavor of frontend developer that came from the backend and sneers at actually having to learn frontend because "it's not real development"

What kind of code does this developer write?

As little code as possible to get the job done without enormous dependencies. Avoiding js and using css and html as much as possible.
In my experience, generally speaking there is a kind of this developer that tries to write a language they’re familiar with, but in Javascript. As the pithy saying goes, it takes a lot of skill to write Java in every language.
Usually they write only prompts and then accept whatever is generated, ignoring all typing and linting issues
People pushing random throwaway packages is not the issue.

A lot of the culture is built by certain people who make a living out of package maximalism.

More packages == more eyballs == more donations.

They have an agenda that small packages are good and made PRs into popular packages to inject their junk into the supply chain.

Not on HN, the land of "you should use a SaaS or PaaS for that (because I might eventually work there and make money)" or "I don't want to maintain that code because it's not strictly related to my CRUD app business! how you dare!"
1.2 million weekly downloads to this day, when we've had builtin padStart since ES2017.

Yes, I remember thinking at the time "how are people not ashamed to install this?"

I found it funny back when people were abandoning Java for JavaScript thinking that was better somehow...(especially in terms of security)

NPM is good for building your own stack but it's a bad idea (usually) to download the Internet. No dep system is 100% safe (including AI, generating new security vulns yay).

I'd like to think that we'll all stop grabbing code we don't understand and thrusting it into places we don't belong, or at least, do it more slowly, however, I also don't have much faith in the average (especially frontend web) dev. They are often the same idiots doing XYZ in the street.

I predict more hilarious (scary even) kerfuffles, probably even major militaries losing control of things ala Terminator style.

It’s not clear to me what this has to do with Java vs JavaScript (unless you’re referring to the lack of a JS standard library which I think will pretty much minimize this issue).

In fact, when we did have Java in the browser it was loaded with security issues primarily because of the much greater complexity of the Java language.

Java has maven, and is far from immune from similar types of attacks. However, it doesn't have the technological monstrosity named NPM. In fact that aforementioned complexity is/was an asset in raising the bar, however slightly, in producing java packages. Crucially, that ecosystem is nowhere near as absurdly complex (note, I'm ignoring the I'll fated cousin that is Gradle, and is also notorious for being a steaming pile of barely-working inscrutable dependencies)

Anyways, I think you are missing the forest for the trees if you think this is a Java vs JavaScript comparison, don't worry it's also possible to produce junk enterprise code too...

Just amusing watching people be irrationally scared of one language/ecosystem vs another without stopping to think why or where the problems are coming from.

It's not the language it's the library that's not designed to isolate untrusted code from the start. Much harder to exit the sandbox if your only I/O mechanism is the DOM, alert() and prompt().
And the whole rest of the Internet...

The issue here is not Java or it's complexity. The point is also not Java, it's incidental that it was popular at the time. It's people acting irrationally about things and jumping ship for an even-worse system.

Like, yes, if that really were the whole attack surface of JS, sure nobody would care. They also wouldn't use it...and nothing we cared about would use it either...

The security issues with Java applets usually led to local unsandboxed code execution. It's a lot harder to do that with JS because just running Java and confusing the security manager gets you full Java library access, vs JS with no built in I/O.
In that era JavaScript was also loaded with security issues. That's why browsers had to invest so much in kernel sandboxing. Securing JavaScript VMs written by hand in C++ is a dead end, although ironically given this post, it's easier when they're written in Java [1]

But the reason Java is more secure than JavaScript in the context of supply chain attacks is fourfold:

1. Maven packages don't have install scripts. "Installing" a package from a Maven repository just means downloading it to a local cache, and that's it.

2. Java code is loaded lazily on demand, class at a time. Even adding classes to a JAR doesn't guarantee they'll run.

3. Java uses fewer, larger, more curated libraries in which upgrades are a more manual affair involving reading the release notes and the like. This does have its downsides: apps can ship with old libraries that have unfixed bugs. Corporate users tend to have scanners looking for such problems. But it also has an upside, in that pushing bad code doesn't immediately affect anything and there's plenty of time for the author to notice.

4. Corporate Java users often run internal mirrors of Maven rather than having every developer fetch from upstream.

The gap isn't huge: Java frameworks sometimes come with build system plugins that could inject malware as they compile the code, and of course if you can modify a JAR you can always inject code into a class that's very likely to be used on any reasonable codepath.

But for all the ragging people like to do on Java security, it was ahead of its time. A reasonable fix for these kind of supply chain attacks looks a lot like the SecurityManager! The SecurityManager didn't get enough adoption to justify its maintenance costs and was removed, partly because of those factors above that mean supply chain attacks haven't had a significant impact on the JVM ecosystem yet, and partly due to its complexity.

It's not clear yet what securing the supply chain in the Java world will look like. In-process sandboxing might come back or it might be better to adopt a Chrome-style microservice architecture; GraalVM has got a coarser-grained form of sandboxing that supports both in-process and out-of-process isolation already. I wrote about the tradeoffs involved in different approaches here:

https://blog.plan99.net/why-not-capability-languages-a8e6cbd...

[1] https://medium.com/graalvm/writing-truly-memory-safe-jit-com...

If it's not feasible to audit every single dependency, it's probably even less feasible to rewrite every single dependency from scratch. Avoiding that duplicated work is precisely why we import dependencies in the first place.
Most dependencies do much more than we need from them. Often it means we only need one or a few functions from them. This means one doesn't need to rewrite whole dependencies usually. Don't use dependencies for things you can trivially write yourself, and use them for cases where it would be too much work to write yourself.
A brief but important point is that this primarily holds true in the context of rewriting/vendoring utilities yourself, not when discussing importing small vs. large dependencies.

Just because dependencies do a lot more than you need, doesn't mean you should automatically reach for the smallest dependency that fits your needs.

If you need 5 of the dozens of Lodash functions, for instance, it might be best to just install Lodash and let your build step shake out any unused code, rather than importing 5 new dependencies, each with far fewer eyes and release-management best practices than the Lodash maintainers have.

The argument wasn’t to import five dependencies, one for each of the functions, but to write the five functions yourself. Heck, you don’t even need to literally write them, check the Lodash source and copy them to your code.
This might be fine for some utility functions which you can tell at a glance have no errors, but for anything complex, if you copy you don't get any of the bug/security fixes that upstream will provide automatically. Oh, now you need a shim of this call to work on the latest Chrome because they killed an api- you're on your own or you have to read all of the release notes for a dependency you don't even have! But taking a dependency on some other library is, as you note, always fraught. Especially because of transitive dependencies, you end up having quite a target surface area for every dep you take.

Whether to take a dependency is a tricky thing that really comes down to engineering judgement- the thing that you (the developer) are paid to make the calls on.

The massive amount of transitive dependencies is exactly the problem with regard to auditing them. There are successful businesses built solely around auditing project dependencies and alerting teams of security issues, and they make money at all because of the labor required to maintain this machine.

It’s not even a judgement call at this point. It’s more aligned with buckling your seatbelt, pointing your car off the road, closing your eyes, flooring it and hoping for a happy ending.

And then when node is updated and natively supports set intersections you would go back to your copied code and fix it?
If it works, why do so? Unless there's a clear performance boost, and if so you already know the code and can quickly locate your interpreted version.

Or At the time of adding you can add a NOTE or FIXME comment stating where you copied it from. A quick grep for such keyword can give you a nice overview of nice to have stuff. You can also add a ticket with all the details if you're using a project management tool and resuscitate it when that hypothetical moment happens.

If you won't, do you expect the maintainer of some micro package to do that?
You have obviously never checked the Lodash source.
The point here isn’t a specific library. It’s not even one specific language or runtime. No one is talking about literally five functions. Let’s not be pedantic and lose sight of the major point.
Yes, fewer, larger, trustworthy dependencies with tree shaking is the way to go if you ask me.
Almost like a standard library..
Yeah, but perhaps we could have different flavors. If you like functional style you could have a very functional standard library that doesn't mutate anything, or if you like object oriented stuff you could have classes of object with methods that mutate themselves. And the Typescript folks could have a strongly typed library.
I wanted to make a joke about

   npm install stdlib 
…but double checked before and @stdlib/stdlib has 58 dependencies, so the joke preempted me.
I think the level of protection you get from that depends on how the unused code detection interacts with whatever tricks someone is using for malicious code.
I agree with this but the problem is that a lot of the extra stuff dependencies do is indeed to protect from security issues.

If you’re gonna reimplement only thr code you need from a dependency, it’s hard to know of the stuff you’re leaving out how much is just extra stuff you don’t need and how much might be security fixes that may not be apparent to you but the dependency by virtue of being worked upon and used by many people has fixed.

I'm using LLMs to write stuff that would normally be in dependencies, mostly because I don't want to learn how to use the dependency, and writing a new one from scratch is really easy with LLMs.
Age of bespoke software is here. Did you have any hard to spot non-obvious bugs in these code units?
It isn't feasible to audit every line of every dependency, just as it's not possible to audit the full behavior of every employee that works at your company.

In both cases, the solution is similar: try to restrict access to vital systems only to those you trust,so that you have less need to audit their every move.

Your system administrators can access the server room, but the on-site barista can't. Your HTTP server is trusted enough to run in prod, but a color-formatting library isn't.

> It isn't feasible to audit every line of every dependency, just as it's not possible to audit the full behavior of every employee that works at your company.

Your employees are carefully vetted before hiring. You've got their names, addresses, and social security numbers. There's someone you're able to hold accountable if they steal from you or start breaking everything in the office.

This seems more like having several random contractors who you've never met coming into your business in the middle of night. Contractors that were hired by multiple anonymous agencies you just found online somewhere with company names like gkz00d or 420_C0der69 who you've also never even spoken to and who have made it clear that they can't be held accountable for anything bad that happens. Agencies that routinely swap workers into or out of various roles at your company without asking or telling you, so you don't have any idea who the person working in the office is, what they're doing, or even if they're supposed to be there.

"To make thing easier for us we want your stuff to require the use of a bunch of code (much of which does things you don't even need) that we haven't bothered looking at because that'd be too much work for us. Oh, and third parties we have no relationship with control a whole bunch of that code which means it can be changed at any moment introducing bugs and security issues we might not hear about for months/years" seems like it should be a hard sell to a boss or a client, but it's sadly the norm.

Assuming that something is going to go wrong and trying to limit the inevitable damage is smart, but limiting the amount of untrustworthy code maintained by the whims of random strangers is even better. Especially when the reasons for including something that carries so much risk is to add something trivial or something you could have just written yourself in the first place.

> This seems more like having several random contractors who you've never met coming into your business in the middle of night. [...] Agencies that routinely swap workers into or out of various roles at your company without asking or telling you, so you don't have any idea who the person working in the office is, what they're doing, or even if they're supposed to be there.

Sounds very similar to how global SIs staff enterprise IT contracts.

That hit much too close to reality. It's exactly like that. Even the names were spot on!
This is true to the extent that you actually _use_ all of the features of a dependency.

You only need to rewrite what you use, which for many (probably most) libraries will be 1% or less of it

Indeed. About 26% of the disk space for a freshly-installed copy of pip 25.2 for Python 3.13 comes from https://pypi.org/project/rich/ (and its otherwise-unneeded dependency https://pypi.org/project/Pygments/), "a Python library for rich text and beautiful formatting in the terminal", hardly any of the features of which are relevant to pip. This is in spite of an apparent manual tree-shaking effort (mostly on Pygments) — a separate installed copy of rich+Pygments is larger than pip. But even with that attempt, for example, there are hundreds of kilobytes taken up for a single giant mapping of "friendly" string names to literally thousands of emoji.

Another 20% or more is https://pypi.org/project/requests/ and its dependencies — this is an extremely popular project despite that the standard library already provides the ability to make HTTPS connections (people just hate the API that much). One of requests' dependencies is certifi, which is basically just a .pem file in Python package form. The vendored requests has not seen any tree-shaking as far as I can tell.

This sort of thing is a big part of why I'll be able to make PAPER much smaller.

What paper?
Yes, that. I didn't want to be too spammy, especially since I honestly haven't been getting much of anything done recently (personal reasons).
it's probably even less feasible to rewrite every single dependency from scratch.

When you code in a high-security environment, where bad code can cost the company millions of dollars in fines, somehow you find a way.

The sibling commenter is correct. You write what you can. You only import from trusted, vetted sources.

> If it's not feasible to audit every single dependency, it's probably even less feasible to rewrite every single dependency from scratch.

There is no need to rewrite dependencies. Sometimes it just so happens that a project can live without outputting fancy colorful text to stdout, or doesn't need to spread transitive dependencies on debug utilities. Perhaps these concerns should be a part of the standard library, perhaps these concerns are useless.

And don't get me started on bullshit polyfill packages. That's an attack vector waiting to be exploited.

Its much more feasible these days. These days for my personal projects I just have CC create only a plain html file with raw JS and script links.
Not sure I completely agree as you often use only a small part of a library
One interesting side effect of AI is that it makes it sometimes easy to just recreate the behavior, perhaps without even realizing it..
is it that infeasible with LLMs?

a lor of these dependencies are higher order function definitions, which never change, and could be copy/pasted around just fine. they're never gonna change

"rewrite every single dependency from scratch"

No need to. But also no need to pull in a dependency that could be just a few lines of own (LLM generated) code.

>>a few lines of own (LLM generated) code.

... and now you've switched the attack vector to a hostile LLM.

Sure but that's a one time vector. If the attacker didn't infiltrate the LLM before it generated the code, then the code is not going to suddenly go hostile like an npm package can.
Though you will see the code at least, when you are copy pasting it and if it is really only a few lines, you may be able to review it. Should review it of course.
If it's that little review the dependency.
The difference is, the dependency can change and is usually way harder to audit. Subfolders in subfolder, 2 lines here in a file, 3 line there vs locking at some files and check what they do.
I did not say to do blind copy paste.

A few lines of code can be audited.

Sounds like the job for an LLM tool to extract what's actually used from appropriately-licensed OSS modules and paste directly into codebases.
Requiring you to audit both security and robustness on the LLM generated code.

Creating two problems, where there was one.

I didn't say generate :) - in all seriousness, I think you could reasonably have it copy the code for e.g. lodash.merge() and paste it into your codebase without the headaches you're describing. IMO, this method would be practical for a majority of npm deps in prod code. There are some I'd want to rely on the lib (and its maintenance over time), but also... a sort function is a sort function.
LLMs don't copy and paste. They ingest and generate. The output will always be a generated something.
You can give an LLM access to tools that it can invoke to actually copy and paste.
In 2022, sure. But not today. Even something as simple as generating and running a `git clone && cp xyz` command will create code not directly generated by the LLM.
LLMs can do the audits now.
Do you have any evidence it wouldn't just make up code.
This is already a thing, compiled languages have been doing this for decades. This is just C++ templates with extra steps.
>> and keeping them to well-known and trustworthy (security-wise) creators.

The true threat here isn't the immediate dependency though, it's the recursive supply chain of dependencies. "trustworthy" doesn't make any sese either when the root cause is almost always someone trustworthy getting phished. Finally if I'm not capable of auditing the dependencies it's unlikely I can replace them with my own code. That's like telling a vibe coder the solution to their brittle creations is to not use AI and write the code themselves.

> Finally if I'm not capable of auditing the dependencies it's unlikely I can replace them with my own code. That's like telling a vibe coder the solution to their brittle creations is to not use AI and write the code themselves.

In both cases, actually doing the work and writing a function instead of adding a dependency or asking an AI to write it for you will probably make you a better coder and one who is better able to audit code you want to blindly trust in the future.

Just like it's going to make you a better engineer if you design the microchips in your workstation yourself instead of buying an x86 CPU.

It's still neither realistic nor helpful advice.

"A little copying is better than a little dependency" -- Go proverb (also applies to other programming languages)
IMO, one thing I like in npm packages is that that usually they are small, and they should ideally converge towards stability (frozen)...

If they are not, something is bad and the dependency should be "reduced" if at all possible.

Exactly.

I always tried to keep the dependencies to a minimum.

Another thing you can do is lock versions to a year ago (this is what linux distros do) and wait for multiple audits of something, or lack of reports in the wild, before updating.

I saw one of those word-substition browser plugins a few years back that swapped "dependency" for "liability", and it was basically never wrong.

(Big fan of version pinning in basically every context, too)

I'm re-reading all these previous comments, replacing "dependency" for "liability" in my mind, and it's being quite fun to see how well everything still keeps meaning the same, but better
> I think this is a good argument for reducing your dependency count as much as possible, and keeping them to well-known and trustworthy (security-wise) creators.

I wonder to which extent is the extreme dependency count a symptom of a standard library that is too minimalistic for the ecosystem's needs.

Perhaps this issue could be addressed by a "version set" approach to bundling stable npm packages.

I remember people in the JS crowd getting really mad at the implication that this all was pretty much inevitable, like 10/15 years ago. Can’t say they didn’t do great things since then, but it’s not like nobody saw this coming.
Easier said than done when your ecosystem of choice took the Unix philosophy of doing one thing well, misinterpreted it and then drove it off a cliff. The dependency tree of a simple Python service is incomparable to a Node service of similar complexity.