Hacker News new | ask | show | jobs
by quotemstr 363 days ago
Yet people use container based isolation all the time in practice and the sky doesn't fall.

Also, every security domain in an Android systems shares a kernel, yet Android is one of the most secure systems out there. Sure, it uses tons of SELinux, but so what? It still has a shared kernel, and a quite featureful one at that.

I don't buy the idea that we can't do intra-kernel security isolation and so we shouldn't care about local privilege escalation.

4 comments

Android delegated some security features to a different kernel called Trusty that is separated from the main Linux kernel using virtualisation. That kernel runs high value security services.

https://source.android.com/docs/security/features/trusty

Yes, but that's not the main load-bearing security part of the system. Trusty doesn't isolate apps from each other. It doesn't isolate work profiles from user profiles. Regular SELinux-augmented thoughtfully-used uid- and process-isolation does that.
If you weren't aware, containers aren't a security boundary. Things like bubblewrap are.
Semantics make hard assertions about "containers" worthless. It depends on what one means by a container exactly, since Linux has no such concept and our ecosystem doesn't have a strict definition.
What to you think bubblewrap is, if not a container runtime?
bubblewrap is actually worse - there are known escapes in there that haven't been fixed for years
It is the most widely used sandbox layer for pretty much everything. What escapes are you talking about? Are we supposed to take your word for it? Come on
Wait. What? What escapes? Is it that bubblewrap not faithfully implement the policy you give it or that there are surprising gaps in the kernel's namespace isolation?
Ironically Ubuntu 24 now blocks users from accessing namespaces because that kernel interface had a bunch of local privilege escalations, breaking programs that want to use them for isolation.
For the last 10 years or so, namespaces in Linux were the source of the absolute hightest number of local privilege escalations and sometimes even arbitrary code executions in kernel space. Building a kernel without user namespace support has been goto-advice for multiuser systems for almost as long. Ubuntu is just late to the game because they mostly have server or single-user-desktop customers.
Actually I think device drivers got you beat there, but no ones suggesting we break them for users safety. Ubuntu today is more user hostile than Windows.
Device drivers are worse if you just count the numbers. But they are usually far less exploitable because very often you need to have the corresponding hardware plugged in or even need to manipulate said hardware to provide crafted inputs. So in reality, device driver problems are almost never exploitable.
Seems ironic considering namespaces are highly utilized for isolation/security purposes.
I presume they're left enabled for root.
The same software that wants to use namespaces for isolation will refuse to run as root.
Not true. Docker, for example. There's plenty of cases where you set up an isolation environment as root and then use it as non-root.
I've even seen namespaces used for hiding malicious software in Ubuntu systems too.
Wouldn't Android's kernel have most of the hardening steps / disabled features described in GP's comment?
No. Things like eBPF, strace, and packet filtering are enabled. Android uses SELinux and other facilities to limit the amount of code the kernel will allow to access these features. Big difference from their being compiled out of the kernel entirely as the OP suggests is necessary.
Container isolation can fail at shared libraries in shared layers too can't it? My evil service is based on the same cooltechframework base layer as your safety critical hardware control service and if there is a mistake in the framework...
then it affects each one separately since they are separate processes. The fact they run the same code is irrelevant if the data is separate.
Separate processes running the same shared instructions. If you compromise and modify those shared instructions, the othe container runs instructions of your choosing.
Layers are COW so one container modifying a layer has no effect on other containers started from the same image. Of course, preexisting vulnerabilities will remain but they'd have to be separately exploited in each container.
Worse, cannot disable eBPF due to too many packages demanding it.

Namely, nft tables and its filtering.