Estimates of prevalence do assume detection. How would we detect that a dependency that was installed a few deployments and reboots ago was compromised?
How does the classic infosec triad (Confidentiality, Integrity, Availability) apply to software supply chain security?
Confidentiality: Presumably we're talking about open source projects; which aren't confidential. Projects may request responsible disclosure in an e.g. security.txt; and vuln reports may be confidential for at least a little while.
Integrity: Secure transport protocols, checksums, and cryptographic code signing are ways to mitigate data integrity risks. GitHub supports SSH, 2FA, and GPG keys. Can all keys in the package signature keyring be used to sign any package? Can we verify a public key over a different channel? When we specify exact versions of software dependencies, can we also record package hashes which the package installer(s) will verify?
Availability: What are the internal and external data, network, and service dependencies for the development and deployment DevSecOps workflows? Can we deploy from local package mirrors? Who is responsible for securing and updating local package mirrors? Are these service dependencies all HA? Does everything in this system also depend upon the load balancer? Does our container registry support e.g. Docker Notary (TUF)? How should we mirror TUF package repos?
Thanks for the links. Do you know how this toolset helps to mitigate/prevent what is called in the GitHub blogpost "Supply chain compromises".
Quickly checked around and couldn't find anything that applies to the dependencies of applications/binaries before they land into the target runtime (i.e k8s).
They walk through one of the workflows (end state is deploying to k8s).
Grafeas is a metadata store, Kritis is a policy engine that plugs into k8s as an admission controller- blessing the "admission" (running) of an image in a namespace.
There are existing tools for each language/runtime that produce known vuln lists for individual artifacts in the language ecosystem. These you feed into Grafeas. And you have your CI pipeline providing manifests for each of your built images that contain all upstream dependencies (these produced from each app's build tool). Then at deploy time, Kritis checks the manifest on the image, and for each artifact in the image, checks for vulns and determines whether the vuln should keep the image from being deployed.
Hope that helps. There are many other workflows but that one is the most direct.
Don't miss how we used TUF [1] and in-toto [2] to build compromise-resilient CI/CD (the first in the industry AFAICT) for the Datadog Agent integrations [3][4] that detects attacks anywhere between our developers and end-users
I think a big step forward is for folks to pin versions of things. NPM and pip and many other systems let software depend on a semantic versioning of their dependencies which makes it impossible to know what will be installed. If you at least know what is going to be installed and the URL is known then you can rely on a third party notary to tell you the expected contents...
Which is what we are building with Asset Transparency to provide a public transparency log backed database of URL content digests.
Notary is a signing scheme from the publisher. It is an improvement over GPG signing + a better scheme for signaling to clients the next version to update.
Asset Transparency doesn't require the publisher to be involved at all and can work on any URL on the internet that is publicly accessible. It also complementary to signing schemes.
Here is the Asset Transparency CLI fetching and verifying the contents of a notary release for example:
tl get https://github.com/theupdateframework/notary/releases/download/v0.6.1/notary-Linux-amd64
Or if you are curious hit the service’s lookup endpoint directly:
By default, have your firewall block _all_ outward connections. Only whitelist the ones you know you need. And as narrow as possible (i.e. specific hosts).
Minimize the number of dependencies. Systems that make it hard to add dependencies have the virtue of thinking harder about whether you want to add them. Having a few central libraries that do exactly what you need is better than drawing in the kitchen sink.
It is often easier to write a specific function that does precisely what you need than people think.
That is easier to change, and easier to maintain in the long run, than ingesting a huge library with its dependencies that do things you will probably never need.
Related: GNU Guix [0] and Bootstrappable Builds [1]. Guix tries to reproducibly bootstrap an entire Linux distribution from a ever-shrinking binary seed.
Ultimately, someone has to manually review the code. Antivirus-like heuristics won't catch everything. Sandboxing may prevent some exfiltration, but can't prevent malicious code from returning malicious results (e.g. imagine a password checking library modified to always accept attacker's password - it can be sandboxed like a nuclear reactor and still screw you). If you verify the code is actually safe and does what it says, then it doesn't matter where the code came from, who wrote it, which CI server published it.
But reviewing code is tedious. It's wasteful for every user to individually review the same code over and over again. You can trust code if enough people who you trust have reviewed it.
Just use software that is in Debian stable.
If a library is not in Debian, then pack it and become a Debian developer and solve that problem for you and thousands of other people that are affected.
Software supply chain is a very old Problem and already solved. No need to reinvent the wheel for each generation of software developers.
Maybe we need some sort of "trust model" for dependencies. E.g. if you depend on a package, you'll have to explicitly state that you trust it. Conversely, a package author may declare that not only are they responsible for their own code, they have also either only used trusted dependencies, or declare their own trust (e.g. by review) of certain dependencies, so that you can transitively build up a trust chain...
In practice, that would all be much more difficult, of course. But it would surface the underlying issue which is that while code reuse is fine and acceptable, using unvetted code is not.
Interesting. I always thought a supply chain hack meant compromising one of the dependencies a product uses, not simply hacking the company itself and altering their product.
Why is that ridiculous? Certainly it’s not the sole answer to the problem on its own, but if I were to use (say) a string manipulation library, why should it have access to my filesystem and the internet?
You're going to have to segment your whole application into chunks, each chunk being sandboxed away from the others, causing huge overheads and complications. It'll generate more complexity, more errors, more security vulnerabilities. And it doesn't even guarantee that the code doesn't do other bad things that the sandbox doesn't deny. Sandboxing has comprehensively failed as a security measure for browser extensions - hence both Chrome and Firefox retreating from extensions.
Or, as a spurious example: you could audit the library's code to make sure it's not doing bad things, and then copy/paste it into your code base. You could even just copy the bits you need and leave the bits that deal with use cases that you don't need. Easier, simpler, more efficient and less dangerous.
>You're going to have to segment your whole application into chunks, each chunk being sandboxed away from the others, causing huge overheads and complications. It'll generate more complexity, more errors, more security vulnerabilities.
I'm going to dispute this. Yes, if your sandbox takes a ton of memory to isolate some piece of code, scaling that up to confine each module individually isn't going to be workable. But who says a sandbox has to be heavyweight?
Our current systems (UNIX-likes, etc) provide a ton of ambient authority to each process; given that, it takes a lot of effort to e.g. intercept syscalls and decide whether or not the application should have access to them. That's an artifact of design decisions from decades ago, though; let's say we were starting from scratch, why give every process access to all those syscalls to begin with? If you want an example of how a system could be designed from the start without that authority, take a look at this paper: http://mumble.net/~jar/pubs/secureos/secureos.html
There's a difference between compiled binary and uncompiled code. I guess if you're working in an interpreted language that never gets compiled, like Python, you might not notice the difference so much. But even then, this is not using an API for a separate service that exists on a different server. This is something that happens in your process.
> ...processes...
If your string processing library has to live in a separate process in order to sandbox it, then yes, you are creating more problems than you're solving.
Please stop with the blog spam, upvote astroturfing and "great article/comment!" stuff. I understand you are payed for evangelism but just take it somewhere else.
How does the classic infosec triad (Confidentiality, Integrity, Availability) apply to software supply chain security?
Confidentiality: Presumably we're talking about open source projects; which aren't confidential. Projects may request responsible disclosure in an e.g. security.txt; and vuln reports may be confidential for at least a little while.
Integrity: Secure transport protocols, checksums, and cryptographic code signing are ways to mitigate data integrity risks. GitHub supports SSH, 2FA, and GPG keys. Can all keys in the package signature keyring be used to sign any package? Can we verify a public key over a different channel? When we specify exact versions of software dependencies, can we also record package hashes which the package installer(s) will verify?
Availability: What are the internal and external data, network, and service dependencies for the development and deployment DevSecOps workflows? Can we deploy from local package mirrors? Who is responsible for securing and updating local package mirrors? Are these service dependencies all HA? Does everything in this system also depend upon the load balancer? Does our container registry support e.g. Docker Notary (TUF)? How should we mirror TUF package repos?
See also: "Guidance for [[transparent] proxy cache] partial mirrors?" https://github.com/theupdateframework/specification/issues/1...