Software supply chain security | HN Mirror

Y	Hacker News new \| ask \| show \| jobs

	Software supply chain security (github.blog)
	83 points by mayakacz 2122 days ago

11 comments

westurner 2121 days ago

Estimates of prevalence do assume detection. How would we detect that a dependency that was installed a few deployments and reboots ago was compromised?

How does the classic infosec triad (Confidentiality, Integrity, Availability) apply to software supply chain security?

Confidentiality: Presumably we're talking about open source projects; which aren't confidential. Projects may request responsible disclosure in an e.g. security.txt; and vuln reports may be confidential for at least a little while.

Integrity: Secure transport protocols, checksums, and cryptographic code signing are ways to mitigate data integrity risks. GitHub supports SSH, 2FA, and GPG keys. Can all keys in the package signature keyring be used to sign any package? Can we verify a public key over a different channel? When we specify exact versions of software dependencies, can we also record package hashes which the package installer(s) will verify?

Availability: What are the internal and external data, network, and service dependencies for the development and deployment DevSecOps workflows? Can we deploy from local package mirrors? Who is responsible for securing and updating local package mirrors? Are these service dependencies all HA? Does everything in this system also depend upon the load balancer? Does our container registry support e.g. Docker Notary (TUF)? How should we mirror TUF package repos?

See also: "Guidance for [[transparent] proxy cache] partial mirrors?" https://github.com/theupdateframework/specification/issues/1...

jonahbenton 2121 days ago

A toolset that answers some of your questions is grafeas- a metadata store at https://github.com/grafeas/grafeas- and kritis, a policy engine at https://github.com/grafeas/kritis.

Cheers.

p932 2121 days ago

Thanks for the links. Do you know how this toolset helps to mitigate/prevent what is called in the GitHub blogpost "Supply chain compromises". Quickly checked around and couldn't find anything that applies to the dependencies of applications/binaries before they land into the target runtime (i.e k8s).

jonahbenton 2120 days ago

Have you seen these preso slides

https://www.slideshare.net/mobile/aysylu/q-con-sp-software-s....

They walk through one of the workflows (end state is deploying to k8s).

Grafeas is a metadata store, Kritis is a policy engine that plugs into k8s as an admission controller- blessing the "admission" (running) of an image in a namespace.

There are existing tools for each language/runtime that produce known vuln lists for individual artifacts in the language ecosystem. These you feed into Grafeas. And you have your CI pipeline providing manifests for each of your built images that contain all upstream dependencies (these produced from each app's build tool). Then at deploy time, Kritis checks the manifest on the image, and for each artifact in the image, checks for vulns and determines whether the vuln should keep the image from being deployed.

Hope that helps. There are many other workflows but that one is the most direct.

Cheers.

chrisweekly 2121 days ago

OUTSTANDING comment; excellent questions. Bookmarked. Thanks for this concise high-level infosec punchlist.

qertoip 2121 days ago

This Sir is senior.

trishankdatadog 2121 days ago

Don't miss how we used TUF [1] and in-toto [2] to build compromise-resilient CI/CD (the first in the industry AFAICT) for the Datadog Agent integrations [3][4] that detects attacks anywhere between our developers and end-users

[1] https://theupdateframework.io/

[2] https://in-toto.io/

[3] https://www.youtube.com/watch?v=9hCiHr1f0zM

[4] https://dtdg.co/integrations-tuf-in-toto

p932 2121 days ago

How this pattern/toolset protect against supply chain compromises of the dependencies used to build the "Datadog Agent" itself?

trishankdatadog 2121 days ago

Apply pattern/toolset recursively. Software supply chain problems largely eventually solved this way.

p932 2121 days ago

Is there any initiative in this direction towards applying this pattern on big dependency management tools (e.g maven, pip, npm)?

trishankdatadog 2119 days ago

Yes, please see PEP 458: https://www.python.org/dev/peps/pep-0458/

philips 2121 days ago

I think a big step forward is for folks to pin versions of things. NPM and pip and many other systems let software depend on a semantic versioning of their dependencies which makes it impossible to know what will be installed. If you at least know what is going to be installed and the URL is known then you can rely on a third party notary to tell you the expected contents...

Which is what we are building with Asset Transparency to provide a public transparency log backed database of URL content digests.

https://www.transparencylog.com

We have started to build tools for integrating into release pipelines too:

https://www.transparencylog.com/software-release-process-int...

I think it would be great to see package management systems use things like this. Go already does.

If anyone wants to get started quickly checkout our CLI tool:

https://github.com/transparencylog/tl

p932 2121 days ago

How this compares with https://github.com/theupdateframework/notary?

philips 2121 days ago

Notary is a signing scheme from the publisher. It is an improvement over GPG signing + a better scheme for signaling to clients the next version to update.

Asset Transparency doesn't require the publisher to be involved at all and can work on any URL on the internet that is publicly accessible. It also complementary to signing schemes.

Here is the Asset Transparency CLI fetching and verifying the contents of a notary release for example:

    tl get https://github.com/theupdateframework/notary/releases/download/v0.6.1/notary-Linux-amd64

Or if you are curious hit the service’s lookup endpoint directly:

    curl http://beta-asset.transparencylog.net/lookup/github.com/theupdateframework/notary/releases/download/v0.6.1/notary-Linux-amd64

trishankdatadog 2121 days ago

Philip is right: they are complementary:

https://ssl.engineering.nyu.edu/blog/2020-02-03-transparent-...

brobdingnagians 2121 days ago

By default, have your firewall block _all_ outward connections. Only whitelist the ones you know you need. And as narrow as possible (i.e. specific hosts).

Minimize the number of dependencies. Systems that make it hard to add dependencies have the virtue of thinking harder about whether you want to add them. Having a few central libraries that do exactly what you need is better than drawing in the kitchen sink.

It is often easier to write a specific function that does precisely what you need than people think.

That is easier to change, and easier to maintain in the long run, than ingesting a huge library with its dependencies that do things you will probably never need.

snicker7 2121 days ago

Related: GNU Guix [0] and Bootstrappable Builds [1]. Guix tries to reproducibly bootstrap an entire Linux distribution from a ever-shrinking binary seed.

[0]: https://guix.gnu.org/

[1]: https://bootstrappable.org/

pornel 2121 days ago

I'm a proponent of distributed code reviews as a solution: https://github.com/crev-dev/crev

Ultimately, someone has to manually review the code. Antivirus-like heuristics won't catch everything. Sandboxing may prevent some exfiltration, but can't prevent malicious code from returning malicious results (e.g. imagine a password checking library modified to always accept attacker's password - it can be sandboxed like a nuclear reactor and still screw you). If you verify the code is actually safe and does what it says, then it doesn't matter where the code came from, who wrote it, which CI server published it.

But reviewing code is tedious. It's wasteful for every user to individually review the same code over and over again. You can trust code if enough people who you trust have reviewed it.

john61 2121 days ago

Just use software that is in Debian stable. If a library is not in Debian, then pack it and become a Debian developer and solve that problem for you and thousands of other people that are affected.

Software supply chain is a very old Problem and already solved. No need to reinvent the wheel for each generation of software developers.

dward 2121 days ago

Just run apt install billion-$$$-arr regulated-institution and write a systemd unit file, then run apt upgrade occasionally. What’s the problem?

Tainnor 2121 days ago

Maybe we need some sort of "trust model" for dependencies. E.g. if you depend on a package, you'll have to explicitly state that you trust it. Conversely, a package author may declare that not only are they responsible for their own code, they have also either only used trusted dependencies, or declare their own trust (e.g. by review) of certain dependencies, so that you can transitively build up a trust chain...

In practice, that would all be much more difficult, of course. But it would surface the underlying issue which is that while code reuse is fine and acceptable, using unvetted code is not.

katsume3 2121 days ago

And more reading on this matter, relevant to CCleaner's supply chain mishap:

https://www.wired.com/story/inside-the-unnerving-supply-chai...

Something like Bleachbit seems more reputable, but not necessarily immune from similar attacks, but it's what I use instead of CCleaner.

mint2 2121 days ago

Interesting. I always thought a supply chain hack meant compromising one of the dependencies a product uses, not simply hacking the company itself and altering their product.

trishankdatadog 2121 days ago

Yes, I talked about it:

https://news.ycombinator.com/item?id=24371628

marcus_holmes 2121 days ago

I'm glad this is beginning to be taken seriously. And that the answer isn't "sandbox every dependency" which is ridiculous.

ryukafalz 2121 days ago

Why is that ridiculous? Certainly it’s not the sole answer to the problem on its own, but if I were to use (say) a string manipulation library, why should it have access to my filesystem and the internet?

marcus_holmes 2121 days ago

Because it's not an app. It's just some code.

You're going to have to segment your whole application into chunks, each chunk being sandboxed away from the others, causing huge overheads and complications. It'll generate more complexity, more errors, more security vulnerabilities. And it doesn't even guarantee that the code doesn't do other bad things that the sandbox doesn't deny. Sandboxing has comprehensively failed as a security measure for browser extensions - hence both Chrome and Firefox retreating from extensions.

Or, as a spurious example: you could audit the library's code to make sure it's not doing bad things, and then copy/paste it into your code base. You could even just copy the bits you need and leave the bits that deal with use cases that you don't need. Easier, simpler, more efficient and less dangerous.

ryukafalz 2121 days ago

>Because it's not an app. It's just some code.

Yes. So?

>You're going to have to segment your whole application into chunks, each chunk being sandboxed away from the others, causing huge overheads and complications. It'll generate more complexity, more errors, more security vulnerabilities.

I'm going to dispute this. Yes, if your sandbox takes a ton of memory to isolate some piece of code, scaling that up to confine each module individually isn't going to be workable. But who says a sandbox has to be heavyweight?

Our current systems (UNIX-likes, etc) provide a ton of ambient authority to each process; given that, it takes a lot of effort to e.g. intercept syscalls and decide whether or not the application should have access to them. That's an artifact of design decisions from decades ago, though; let's say we were starting from scratch, why give every process access to all those syscalls to begin with? If you want an example of how a system could be designed from the start without that authority, take a look at this paper: http://mumble.net/~jar/pubs/secureos/secureos.html

For a recent attempt at doing essentially this, take a look at this intro to the Bytecode Alliance: https://hacks.mozilla.org/2019/11/announcing-the-bytecode-al...

marcus_holmes 2120 days ago

>Yes. So?

There's a difference between compiled binary and uncompiled code. I guess if you're working in an interpreted language that never gets compiled, like Python, you might not notice the difference so much. But even then, this is not using an API for a separate service that exists on a different server. This is something that happens in your process.

> ...processes...

If your string processing library has to live in a separate process in order to sandbox it, then yes, you are creating more problems than you're solving.

trabant00 2121 days ago

Please stop with the blog spam, upvote astroturfing and "great article/comment!" stuff. I understand you are payed for evangelism but just take it somewhere else.