Hacker News new | ask | show | jobs
by icebraining 3396 days ago
I think the implication is that the isolation should be per customer, each being allocated their own parsing process, isolated from the other customers.

That's roughly what we do, though we run an hosted version of an open source webapp, not a CDN. It's more expensive resource-wise (particularly RAM), but it has meant that we were immune to 90%+ of the security bugs discovered in the platform.

2 comments

Sure, that's a valid question to ask. But imagine you have 1,000,000 customers. Now you have to calculate and manage scaling groups for 1,000,000 customers * number of services. The resourcing costs alone would be outlandish, not to mention trying to independently scale each customer. Perhaps container systems would make this easier, but do they have better memory isolation? Is it possible for a container process to overrun into another containers memory without an exploit in the container system?
A container is a process with some extra isolation (namespaces), they certainly can't overrun into each other without an exploit.

Why would the costs be outlandish? We offer that and we're fairly cheap. Since the cost is mostly fixed per customer, it should scale linearly.

As for scaling, they already have to do that, by pointing different requests at different servers depending on their load, etc.

Assuming for a minute that containers aren't in play, then the isolation model becomes that of a server/vm with the associated overhead of each. To make this easier we'll assume there's only a single service, even though we know this to be untrue.

If there are 1M customers that's a minimum of 1M servers. Some customers are obviously larger and would need more. There's also HA. Let's conservatively call it 2.5M servers.

At an absolute bare minimum we'd need to allocate 2.5M GB of ram and 2.5M vCPU. That's a huge amount of resources.

If you could reliably fit 10000 Small customers on a single server at 32gb ram and 8 cpus you can already start to see how many resources can be saved.

Without customer isolation you've got the entire cluster to handle load spikes and HA. With isolation you have to have scheduling monitoring each of the 1M clusters and scaling appropriately by anticipating demand.

Scaling a service is way easier than scaling customers within a service or many services.

Assuming for a minute that containers aren't in play, then the isolation model becomes that of a server/vm with the associated overhead of each.

Why? There's nothing magical about a container, it's literally just a cgroup of Linux processes. You don't have to use them to get the memory isolation we're talking about - uncontained processes get it too.

That's what we do: one process per client, uncontained, just running on a different system user.

But in any case, sure, use containers, I'm certainly not opposed to them.

That really isn't practical given the number of datacentres CF are in * the number of free customers they have.

Perhaps for some tier of paid customer.

There are many ways to ensure a system is fault-tolerant, scalable and still reasonably safe.

They certainly have the resources to solve this if only they want.

All I can say is that I am not willing to pay for something so fragile, but that is only my own opinion.

In the real world things may go differently: they exposed critical data back in 2012 too and they're still here ...

Linux namespaces/containers create a memory page table completely separate from the host's so barring vulnerabilities in the container implementation that allow mapping host physical memory to guest virtual, isolation is strictly enforced by the memory controller in the hardware. Without an exploit, the worst case scenario is leaking shared library read-only sections across containers (since the physical memory might be shared for a smaller container footprint, although i don't know if LXC supports that yet).
> Linux namespaces/containers create a memory page table completely separate from the host's

Each _process_ has its own memory page table. Containers are built out of processes, so they inherit this attribute.

Namespaces have nothing to do with it.

Sorry, I should have elaborated: with namespaces, each container instance gets its own process table with separate non-shareable pages (without KSM or other dedup feature) and then each container process gets its own page tables, like they normally do. The point is that there's an extra level of isolation beyond just processes, although there is still the kernel attack surface.
> each being allocated their own parsing process

That just punts the vulnerable code elsewhere. A kernel bug could leak memory across processes. And the kernel is also written in C, so you aren't getting protection from a "better" language either.

That's assuming that the likelihood of such a bug in the kernel code is the same as a bug in an HTML parser. And also that this bug would go unnoticed for months. The fact that both are written in C doesn't make them equal.
absolutely true
Thing is, with the single model process, a kernel bug could also leak memory between customers. And in fact, it's much more likely, since for it they're all in the same security context. So it's not punting the code, it's reducing the attack surface.
> And in fact, it's much more likely, since for it they're all in the same security context.

I disagree. I think it's much less likely, because the kernel doesn't usually get involved in the process' memory once it is allocated.

Sure, it maps the virtual memory about to physical memory as needed, but bugs there is likely to cause severe corruption resulting in an immediate crash. The kernel doesn't go low level enough for it to be likely to result in messing with a process such that the single process continues to work but also leaks across the process-internal customer boundary that the kernel cannot see. That would require a level of surgical precision I don't think is likely in a bug.

To be clear, I'm not saying that you shouldn't use per-customer processes. The kernel has more eyes and is less likely to be vulnerable in this way. Just that from an analytical perspective, you are really just moving the problem elsewhere, rather than solving it.