Hacker News new | ask | show | jobs
by moonlet 165 days ago
I am so sick of the ‘sandboxed’ AI-infra meme. A container is not a sandbox. A chroot is not a sandbox. A VM is also not a sandbox. A filesystem is also also not a sandbox. You can sandbox an application, you can run an application in a secure context, but this is not a secure context the author is describing, firstly, and secondly they haven’t described any techniques for sandboxing unless that part of the page didn’t load for me somehow.
5 comments

Didn’t mean to say this is a sandbox, it certainly isn’t, this is just an illustration on how to bridge the gap and make things available in a file system from the source of truth of your application.

There is tons of more complexity to sandboxing, I agree!

No worries! And I definitely appreciate you taking time to write up your work, it’s a good blog.
Wait, can you provide the positive definition for "sandbox" you're relying on here?
To me ‘a sandbox’ is a secured context, which is specific to whatever is in it. It is not a generic thing unless we are literally referring to a real-world box with sand in it, and I’ve kinda hit the breaking point with the term in tech. ‘A sandboxed application’ to me is an instrumented and controlled deployment of an application that can only make the sys/network/ipc calls the deployer expects and appreciates, which are then themselves filtered and monitored. A sandboxed deployment of an application? Sure. That’s a thing to me. But each application needs different privileges and does different things. Sandboxing an application may involve lots of different technologies. Eg the way I think about it, things like seccomp, apparmor, et al also aren’t themselves ‘sandboxes’, they’re enforcement mechanisms which rely on knowing and configuring them to monitor and enforce what the app should and shouldn’t do. A lot of things that assist with sandboxing may also be combined in different ways to get to a more secure environment, in which the app is sandboxed.
You may just be using a personalized definition of that word, that differs from what it means.

https://en.wikipedia.org/wiki/Sandbox_(computer_security)

Notably, a sandbox exists to separate one thing from other things. Limiting/filtering/monitoring what the sandboxes thing can do are often components of that, but the underlying premise is about separation.

Containers, VMs, etc. are 100% examples of sandboxing based on the actual industry definition of the term.

I’m saying I don’t think sandbox is a noun, I think it’s a verb. I also don’t get why this is such an issue to you? A container simply is not a sandbox by itself. The collection of technologies that can sandbox can be used to sandbox a container, or an app running in a container, or whatever you want. A door lock isn’t security, a door lock is used to lock your door, which gives you part of a security strategy. Same principle.
A door lock is a lock and you can lock a door lock. A container can be a sandbox and you can use a container to sandbox.
> I’m saying I don’t think sandbox is a noun, I think it’s a verb.

You are incorrect.

What background or context do you have that you base this claim on?
No they are not. The "industry" totally disagrees with this statement as well.
This is definitely false. "The industry" calls everything a sandbox.
I recently had a question about what AI sandboxes use and I think Modal uses gvisor under the hood and I think others use firecracker/generally favour it as well

Firecracker kind of ends up being in the VM categories and I would place gvisor in a similar category too under the VM

So in my opinion, VM's are sandboxes.

Of course there is also libriscv https://github.com/libriscv/libriscv which is a sandbox (The fastest RISC-V sandbox)

There is also https://github.com/Zouuup/landrun Run any Linux process in a secure, unprivileged sandbox using Landlock. Think firejail, but lightweight, user-friendly, and baked into the kernel.

Your mileage may vary but I consider firecracker to be the AI sandbox usually. Othertimes it can be that they abstract on a cloud provider and open up servers in that or similar (I feel E2B does this on top of gcp)

A lot of these "ai sandbox" conversations target code that is already running in a public cloud. Running firecracker doesn't give you magical isolation properties vs running an application in ec2 - it's the same boundary. If you're trying to compare to running multi-tenant workloads in containers on the same vm vs different tenants on different vms - sure that's an improvement but no one said you had to run containers to begin with.

Furthermore, running lots of random 3rd party programs in the same instance, be it a container, or an ec2 vm, or a firecracker vm all have the same issues - it is inherently totally unsafe. If you want to "sandbox" something you need to detail what exactly you are wanting to isolate.

A lot of people might suggest not being able to write to the filesystem, read env vars, or talk over the network but these are table stakes for a lot of the workloads that people want to "isolate" to begin with.

So not only is there this incorrect view that you are isolating anything at all, but I'm not convinced that the most important things, like being able to run arbitrary 3rd party programs, is even being considered.

Please brother may i have some pledge unveil
By your definition, a physical child's sandbox isn't a sandbox.