I'm curious to hear if anyone else does any due diligence before using open source packages in production? Not anything major - just checking for recent commits / activity, issue logs, etc.
Do you have a pragmatic approach that you typically recommend, considering that (without version pinning thousands of packages) anything could change from one day to the next? (even if a package was "good" today, it could turn "bad" tomorrow)
The best I've been able to come up with is to pick things that have minimal dependencies of their own. It doesn't eliminate the threat, but it does at least reduce it.
No it’s not easy to realize. From a security perspective the idea is to always run the latest. Breaking backwards compatibility becomes a more difficult proposition. In reality you need to have an engineer test or analyze the updates. Some mature libraries maintain backwards compatibility (eg openssl) but if using something like npm it becomes almost impossible.
I used to audit opensource code for security issues, on a regular basis, and even now before I install a public-facing application I generally have a look at the code.
It's not often I spot anything major, but I figure if I have the time I should do it just in case.
I often look at the code for PHP-extensions, npm-libraries, and similar that colleagues introduce. Just to be sure there's not anything blatently horrid going on.
If you work in a secure environment or support critical infrastructure there are teams whose sole purpose is to approve/deny releasing software regardless of who wrote it. Such teams will typically require source code, written justification, senior management signed approval, and test validation. In the case where source code is not provided, such as closed source commercial software, the vendor will be required to accept liability for all losses due to their software as ratified by a signed contract.
I normally read a good chunk, if not all of the code of a dependency before I add it to my projects except in the case of community standard things (in Ruby) such as ActiveSupport or Sequel. Going over a prospective dependency a few months ago bore fruit in proving why you should always do this. NewsAPI is a neat little API for fetching news whose docs just so happen to show a ruby gem. Being the lazy developer I am I’d like to use the gem than build another API client, but before I did that I read the source as one should. Low and behold what do I find but the evil eval in the code for a dirt simple API client. No thanks.
I am paranoid about security of all those packages, so yes, even before just downloading, I check the authors, activity and read the source code. Not always – e.g. I skip the source code if it's something big AND very reputable AND I decided that I need it such as scala/scala or facebook/react – but I do my best.
It's very annoying, it's not free, and it affects what kinds of libraries I use. My projects have fewer and smaller dependencies than typical because of these self imposed constraints.
On the upside, borrowing a pattern or a dozen lines of code instead of pulling a dependency that will remain 90% unused is really underrated. As is understanding how things work under the hood.
React itself is big & reputable, but the dependency tree is massive and I doubt that it's getting fully vetted on an ongoing basis. Even if you vet it today, any given dependency can be updated to something else tomorrow.
There are definitely some things in React's dependency tree that are a bit questionable if you are sensitive enough to any given problem, beyond just security. For example, packages where the license being used is contradictory between the package.json vs the LICENSE file or the full license terms are not expressed within these but are clarified in the README.md.
I would be surprised if anyone has enough resources or willingness to do that for every open source package they are using. For companies that go through auditing, they can CTA by relying on products like Nexus IQ.
> Really? What about all the dependencies from those dependencies?...
Yep, all the way to the end.
I got the idea from a book called "Hollywood Secrets of Project Management Success" by James R. Persse. It's two books interleaved really, one is just a standard pitch for Agile methods (IMO), but the other is a presentation of the process that large film studios use to make movies. The movie industry is ~100 years old and mostly very good at bringing in projects on time and under budget.
Somewhere in there he talks about how they'll track their dependencies in a kind of "portfolio", I forget the details, but it translates in IT to a "dependency portfolio" and you would (if you're large enough) have an actual "Deps Dept." and a Deps Manager whose sole job is tracking dependencies and their updates and patches, etc.
> working on an app that has 82 npm dependencies
Ach! Well, see, there's your problem right there. :-)
Seriously though, one of the benefits of a dependency portfolio is to help you know when your system has gotten out of hand. The problems are still there even if you don't look at them, eh?
> It's absolutely ridiculous to investigate all of them... but it's also necessary if you want to be sure...
Thanks for the response, that's an interesting way to deal with it. How do you verify a dependency? Do you literally examine the source code? Make sure the build is reproduced? or just the meta data? (downloads, stars) has the portfolio actually prevented any vulnerabilities?
It's pretty common for JS projects to have thousands of transitive dependencies, I'm not sure keeping a private portfolio is much use. The entire open-source ecosystem is built on the foundation of trust, if I use a package that's being used by 500 other packages, I can have a high degree of certainty that the package is safe, and by locking the dependencies with yarn.lock I can prevent sneaky updates from entering the system.
Anyway maybe I'll look into the dependency portfolio, see how it goes.
> How do you verify a dependency? Do you literally examine the source code?
Yeah. It's part of the overhead of using the software. You also look at the history of bugs and how they were handled.
> It's pretty common for JS projects to have thousands of transitive dependencies
Yeah, I know, and it's bonkers IMO.
> The entire open-source ecosystem is built on the foundation of trust
In practice, yes, but in theory, no. The whole idea is that you get to see the code you're running, because the guys who wrote it are clowns. Free Software started when RMS wanted to fix his printer and Xerox said, "No."
> if I use a package that's being used by 500 other packages, I can have a high degree of certainty that the package is safe
I think history has shown that that reasoning is at best probabilistic, eh? You're gambling.
Now, of course, there are limits. Some things get a pass. Do we audit the source of the bash shell? No, despite the fact that it's maintained by a single volunteer.
> Anyway maybe I'll look into the dependency portfolio, see how it goes.
Check out that "Hollywood Secrects" book I mentioned.
I just don't see how anyone could realistically look at all the lines of code that any nontrivial JavaScript app relies on in any depth.
I'm sure most people don't review the code for their operating system, drivers, web server, compiler, browser etc. but they do assess if the entities that write + support them are worth trusting. This is likely the only realistic approach for complex JavaScript apps also.
> > I just don't see how anyone could realistically look at all the lines of code that any nontrivial JavaScript app relies on in any depth.
> Right. And that's really bad.
> > I'm sure most people don't review the code for their operating system, drivers, web server, compiler, browser etc.
> Right, but some people do. Hire one of them.
The interesting question isn't if you can do it, it's when should you, to what extent, and how much it will cost.
"Always do it, do it in-depth, the time consumed isn't important and the budget isn't important" is a bad approach for example and isn't helpful to the OP.
Successful software development is all about making appropriate tradeoffs - you're not going to get very far by conducting your own OpenSSL audit when all you want to do is write a todo web app.
The "deps portfolio" gets updated whenever the deps change. In practice the flow goes like this:
0. A dev wants to use a new dependency, likely after experimenting with it a little bit.
1. Preliminary evaluation, which includes a transitive dependency scan. ("Too many dependencies" is a valid fail condition all on it's own.)
2. If everything looks good we bring it and it's deps into our internal repo. This includes the plumbing to add it to our dev|test|production envs. (Using Docker or whatever.)
3. Now the devs can use it in code destined for prod. There's a nice page in the company wiki that lists the exact version(s) with links to the docs, bug trackers, mailing lists, etc. and also the internal company lore for that package.
It's tight.
- - - -
This might seem like a lot of work up front, but think about all the work it saves down the line.
When one of the transitive dependencies fixes a security issue, is it then re-evaluated prior to being updated in the internal repo?
I'm guessing you work at a pretty large tech company. It seems wasteful that so many companies might be replicating this work. I wonder if there might be the opportunity for a body to review & approve packages on behalf of many companies. Perhaps npm will eventually move in this direction.
Doesn't everyone? That's one of the annoying parts of using other people's code. You have no idea how good or bad it is until you have thoroughly vetted it.
No, most people don't, they even have a hard time keeping library versions up to date.