Hacker News new | ask | show | jobs
by _xivi 1038 days ago
> I had to provide audits of our licenses of libraries we use in our code base to investors and attest we didn’t have AGPL code in our code/software in our stack.

A little off topic, but how are these audits usually done? I'd like to familiarize myself

Not necessarily your specific case, if there's a resource online that'd be fine

3 comments

We run a script that scans our source code and generates a manifest of all included libraries including license/copyright. Here's one of them:

https://github.com/nexB/scancode-toolkit

There are others, but scancode is probably the best one, their license database is truly huge.

https://wiki.debian.org/CopyrightReviewTools https://scancode-licensedb.aboutcode.org/

The less automated version is a list of requirements to the effect "list all the OSS libraries you use and their licenses" are pushed down to each team and someone spends a day or 2 going through their code base. Usually that amounts to looking at a lock/dependency file and finding the software's license online.

You can also run an audit against your artifact store/cache if you have one. JFrog Artifactory has built-in tools for auditing dependencies (and can act as a pull through cache) so you can run reports that way but it can be harder to tie dependencies to what's using them.

Back when I worked at Chase, it was against company policy so use 3rd party dependencies that weren't in their internal components database. Part of getting something in the components database was establishing an owner, the license, and the version (basically a paperwork/approval process). In addition, part of the deploy process was running a vulnerability check against your dependencies using some proprietary enterprise software that tracked CVEs and could (somewhat...) parse what dependencies an application was using (ideally, automatically...).

We had to do this at Raytheon, too, and when someone asks why government projects take so long and run over budget, this is at least one reason. We had a mandate from our government customer to reuse code originally written by Sandia national labs to avoid rewriting functionality that already existed, but because this was code created by researchers in a much lower security environment, it had hundreds of open source dependencies, and we needed to get approval for all of them, which consisted of exactly this process, except we had to submit one set of requests to Raytheon's corporate approval folks and one set to the customer's security team. The corporate approval is almost totally pointless, because if they deny something, but the customer says to use it anyway, the customer decision always overrides the corporate decision. The project doing this was an attempt to accomplish three things at the same time:

* Migrate a legacy GEOINT system from on-prem hardware to Amazon C2S (the CIA's private version of AWS)

* Migrate the distributed runtime from Apache Felix (a Java OSGI implementation) to Kubernetes

* Split the algorithmic part of the processing flow from the metadata retrieval and orchestration part and assign the algorithmic part to another contractor (which was even more pointless, because that contractor just sub-contracted the actual development work back to Raytheon because we were the only people on the planet with a realistic level of expertise to do it)

This went about as well as you might expect, with roughly zero people on the project knowing anything about how AWS worked, zero people knowing anything about how Kubernetes worked, C2S having a barren subset of AWS services, and the FOSS approvals taking nearly a year to work through the backlog before that part of the team could do anything except prototype shit in an isolated sandbox.

So yeah, I left years ago, but I think they finally ended up delivering a small subset of the originally intended functionality something like three years late.

Fossa built a business around this: https://fossa.com/