Hacker News new | ask | show | jobs
by fasterdom 2539 days ago
We need a sort of capability and permission method for libraries.

For example a "strong_password" library should only by given "CPU compute" permissions, no I/O.

But even with this, the problem will be like we see on phone, popular libraries will require all the permissions.

You'll want to install React, and React + it's 100 dependencies will request everything.

16 comments

To be honest, even the coarsest-possible permissions of "can do I/O" vs. "can't do I/O" would be exceedingly effective at stymieing these sorts of attacks; all malicious software of this sort needs to do I/O at some point, and relatively few libraries actually have a good excuse to do I/O (though logging might be thorny).

That said it seems easier said than done to impose those sorts of restrictions on a per-dependency basis. Attempts to statically verify the absence of I/O sounds like a great game of whack-a-mole, and I don't know how you'd do it dynamically without running all non-I/O dependencies in an entirely separate process from the main program.

> few libraries actually have a good excuse to do I/O (though logging might be thorny).

Yeah, logging would be tricky...

Maybe a "logging" capability could be created. Separated from other I/O.

Such a capability would be weird, and nonstandard, and messy, cutting across several several abstraction layers. But if pulled off, it might be worth the effort.

That's solved in similar frameworks by separating open and read/write. You open (or inherit from somewhere) a logging socket, drop the open privileges, retain the permission to write to the log socket.
This discussion is basically inventing a per-library pledge(2).
or apparmor, selinux, grsec, tomoyo, ... But those systems can't integrate into scripting language per-library use case without some serious thread / IPC overhead.
These others can achieve what's intended, but the entire flavour of the discussion is a dead ringer for pledge's purpose and interface, which is much simpler and very much internal to the software (a self-check of sorts).
Haskell indirectly solves this by separating `trace` (a form of logging) from IO (trace is a procedure that logs function call while all other IO must be contained in an IO monad).
> That said it seems easier said than done to impose those sorts of restrictions on a per-dependency basis.

Isn't this the sort of thing type inference is made for? Along with return types, functions have an io type if they're marked (std lib) or if they contain a marked function. Otherwise they have the pure type.

Doing this usefully does require more than just “does IO” — e.g. does that mean it can load another module, read a list of too-common passwords, write to a log file, or read your ~/.aws/credentials? Similarly, does allowing networking mean it can talk to anything or just a few well-known hostnames and ports?

This isn’t to say that it’s a bad idea but there are a ton of details which get annoying fast. I know the Rust community was looking into the options after the last NPM hijack was in the news but it sounded like it’d take years to make it meaningfully better.

If you're running Haskell. Few other languages can do it.
> running all non-I/O dependencies in an entirely separate process from the main program.

Maybe that's not such a bad idea. This "strong_password" thing is written in Ruby, a few milliseconds delay is probably not noticeable anyway and vastly preferable given the security implications.

particularly in ruby where your code can pretty much redefine anything anywhere else in the code whenever it wants.
A whole lot of security is playing whack-a-mole at the end of the day.
The design of macOS and iOS has been moving this way. Many of Apple's first-party applications and frameworks have been broken down into backend "XPC services" that (attempt to) follow the principle of least privilege[1]. Each service runs in a separate process, the system enforcing memory isolation and limiting access to resources (sandboxing).

It's a good idea on paper, but has caveats. Every service is responsible for properly authenticating its clients, and needs to be designed so that a compromised client cannot leverage its access to a service to elevate privileges. Sandboxes are difficult to retrofit onto existing programs. The earlier, lowest-common-denominator system frameworks were not originally written with sandboxing in mind. There are numerous performance drawbacks.

For Apple ecosystem developers, XPC services are also how "extensions" for VPN, Safari ad blockers, etc. are written, for a mix of security and stability benefits.

Though funnily enough, as Apple has pursued these technologies, many HN commenters have decried the walls of the garden closing in.

1: https://en.wikipedia.org/wiki/Principle_of_least_privilege

Hm, interesting. One way to solve this would be to have a language with a very rigid import system - it should be _impossible_ for a library to use a module it hasn't imported, even if that module has been loaded elsewhere in a process. This is probably harder than it looks, and many languages have introspection features that are incompatible with this goal.

With a rigid import system, each library would be forced to declare what it's going to import (including any system libraries), and then you could e.g. enforce a warning + confirmation any time an updated dependency changes its import list.

It doesn't prevent you from getting owned by a modified privileged library, but it's better than the current case. Unfortunately, it probably requires some language (re-)design to fully implement this approach.

> With a rigid import system, each library would be forced to declare what it's going to import (including any system libraries), and then you could e.g. enforce a warning + confirmation any time an updated dependency changes its import list.

Which means you would get warnings on pretty much any functional upgrade of most dependencies, which would make the whole system useless from a security point of view.

In theory, a point release of a library really shouldn’t be requiring new permissions, and you shouldn’t be randomly upgrading your code to newer major versions without checking for compatibility anyway.

Why should a functional upgrade of a dependency introduce new dependencies anyway? A library that sets out to do a particular thing shouldn’t grow new features that require new capabilities willy-nilly.

> Why should a functional upgrade of a dependency introduce new dependencies anyway? A library that sets out to do a particular thing shouldn’t grow new features that require new capabilities willy-nilly.

Why not? I've often done upgrades with the sole purpose of replacing questionable, hand-written code with external dependencies I've discovered that do the same thing, but better (more features, more tests, more eyes on the code, more fixed issue reports than my often-closed-source code). From string parsing to networking, this happens a lot. The external contracts of my libraries don't change a bit, so why waste a major version? "I'm using someone else's code instead of what I YOLO'd myself" seems like a poor reason to rev a package version--and even if it's not, where do you draw the line? Cribbing code from StackOverflow?

this reminds me of the Boeing 737 MAX8...
> Hm, interesting. One way to solve this would be to have a language with a very rigid import system - it should be _impossible_ for a library to use a module it hasn't imported, even if that module has been loaded elsewhere in a process. This is probably harder than it looks, and many languages have introspection features that are incompatible with this goal.

This _should_ be achievable with Go.

If you look at dependencies as black-boxes that contain their own transitive dependencies, then sure, any given "root-level" dependency of sufficient complexity might end up requesting every permission.

On the other hand, if each dependency in the deps tree had its own required permissions, and you had to grant those permissions to that specific dependency rather than to the rootmost branch of the deps tree that contained it, then things would be a lot nicer. The more fine-grained library authors were in splitting out dependencies, the clearer the permissions situation would be; it'd be clear that e.g. a "left-pad" package way down in the tree wouldn't need any system access.

On the other hand, it'd make sense if dependencies could only add new transitive dependencies during "version update due to automatic version-constraint re-evaluation" if the computed transitive closure of the required permissions didn't increase. Otherwise it'd stop and ask you whether you wanted to authorize the addition of a dep that now asked for these additional permissions.

It's also worth noting that under this system, if you trust a large library like React, but don't trust its dependencies, you might still trust that React is sandboxing its own imports correctly -- and then you could "inherit" React's permissions and be fine without overriding anything.

If you're really worried, then you still could go over your entire tree and override the default settings. But there's nothing that would mean you would be required to do that.

People are thinking about this using the phone/website model, where permissions are only applied at one level. With dependencies, whatever giant framework that you're pulling in could be using the same permissions system to secure its own dependencies, which would make you significantly safer.

Under the current system, you have to hope that none of the authors in your dependency chain make a mistake and get compromised. If everybody can sandbox anything, then you only have to hope that most of those authors don't make a mistake.

If somebody attaches malware to a dependency of a dependency, and if even one person along that chain is following best practices and saying, "yeah, I don't think this needs a special permission", then they've likely just prevented that attack from affecting anyone else deeper down the dependency chain.

Sandboxing in package managers is something that could actually scale pretty well; much better than it does for websites/phones/computers.

That seems like a strategy that would cause significant slowdowns and hassles in development.

High-level (i.e. consuming a lot of dependencies at a lot of levels) tools would simply apply a "allow everything" dependency policy rather than deal with tons of issue reports from people who wanted to import the high-level library in a less-than-root-permissioned project.

Additionally, lots of upgrades do increase the dependency surface. Resolving local usernames is a pretty fundamental thing a lot of dependencies would need. Now consider the libc switch from resolving names via /etc/passwd to resolving from multiple sources (including nslcd, a network/local-network service). If every dependency up the tree adopted a "lowest possible needed IO surface" permission model and then that change happened, it would be hell to pay: maintainers would take the shortest path and open up too many permissions; maintainers wouldn't upgrade and leave some packages trapped in a no-man's-land; or maintainers would give up on pulling in prone-to-changing-permissions dependencies, leading to even more fragmentation.

This idea is baked into the core of Deno. See, for example, https://deno.land/manual.html#permissionswhitelist.
Safe Haskell, a GHC extension, is one example in this space. https://downloads.haskell.org/~ghc/latest/docs/html/users_gu...

Its biggest selling point is that a lot of capability safety could be inferred in packages without the package author separately specifying capabilities.

The basic idea is to disallow the remaining impure escape hatches in Haskell in most code, requiring library authors of libraries that do require those escape hatches (e.g. wrappers around C libraries) to assert that their library is trustworthy, and requiring users to accept that trustworthy declaration in a per-user database.

It actually was very promising because the general coding conventions within Haskell libraries made most of them automatically safe, so the set of packages you needed to manually verify wasn't insane (but still unfortunately not a trivial burden, especially if your packages relied on a lot of C FFI).

Unfortunately I have yet to see it used in any commercial projects and it seems in general not to get as much attention as some other GHC extensions.

I know this is about ruby, but it's worth noting that this kind of thing would be solved by effect systems, e.g. Haskell's IO type. If IO isn't part of the signature, you know it's cpu only. Furthermore, you can get more specific such as having a DB type to indicate some code only has access to databases rather than the internet as a whole.
I think you'd also need to prevent things like unsafePerformIO, and equivalent loopholes.
While that might be true, you are not going to switch the world to program in Haskell.

We need a solution which also works for most used languages, JS/C++/Java/Python..., which suggests that it should be done at a higher level, maybe with OS involvement somehow.

Java actually has a pretty useful and powerful securitymanager concept, that nearly noone uses :/
Ruby itself did have something akin to this known as SAFE levels, which prevented IO, exiting the program, etc: https://ruby-hacking-guide.github.io/security.html

Unfortunately, it seems like it's been removed since Ruby 2.1: https://bugs.ruby-lang.org/issues/8468

The shame is that they would have played nicely with the upcoming "guilds" stuff, IMHO.
The .NET Framework 1.0 included "Code Access Security" which included mechanisms to authenticate code with "evidence" (as opposed to traditional 'roles') and the apply permissions similar to your example: DnsPermission, FileIOPermission, RegistryPermission, UIPermission, and so on.

Unfortunately, the architecture was too complex for most developers and fell to the wayside. It was finally removed from the 4.0 Framework after being deprecated for some time.

Sources:

https://www.itwriting.com/blog/2156-the-end-of-code-access-s...

https://www.codemag.com/Article/0405031/Managing-.NET-Code-A...

https://blog.codinghorror.com/code-access-security-and-bitfr...

So, we need a version of pledge from OpenBSD that can surround components / classes https://man.openbsd.org/pledge.2 https://www.youtube.com/watch?v=bXO6nelFt-E
Linux has seccomp for the same purpose. The most restrictive mode of seccomp permits only read, write and exit, which is good for a jailed CPU-only process (read/write commands from a pipe and exit when done - no opening new files or sockets).
There is a bunch of work going on for this in JavaScript see https://www.infoq.com/news/2019/06/making-npm-install-safe/ for links.
Couldn't you theoretically shove all of your untrusted "non-I/O" libraries into a Service Worker? They wouldn't have direct access to the DOM or network I/O that way. It would involve writing some glue code, but perhaps it's worth trading that off for increased "security" (trust)?

EDIT: never mind, looks like I was mistaken about the network i/o part of this... Might be interesting to have a browser-level "sandboxed service worker" for this purpose though...

The skeptic in me thinks that it's never going to work in practice due to 'worse is better': Any system with the 'I/O vs no-I/O' system will have more friction than one without it, and there is no measurable benefit until you get hacked, so most people will not use it (or declare everything as I/O).
That is a brilliant idea. I'm surprised I haven't heard/thought of that yet.
we can't retrofit this onto an existing community and code base... see Python 3 for details. People just won't make extensive changes to their code base is they can't see an immediate, tangible, benefit.
For some languages it might possible to enforce this with just a simple linter