Hacker News new | ask | show | jobs
by simonw 1237 days ago
This looks very promising!

The thing I most want to solve right now is this: I want to write a regular Python application that can safely execute untrusted Python code in a WASM sandbox as part of its execution.

I want to do this so I can let end users customize my web applications in weird and interesting ways by pasting their own Python code into a textarea - think features like "run this Python code to transform my stored data" - without them being able to break my system.

This feels like it should be pretty easy with WebAssembly! It's the classic code sandboxing problem - long a big challenge in Python world - finally solved in a robust way.

I've been finding it surprisingly hard to get a proof-of-concept of this working though.

Essentially I want to be able to do this, in my regular Python code:

    import some_webassembly_engine

    python = some_webassembly_engine.load(
        "python.wasm",
        max_cpu_time_in_seconds=3.0,
        max_allowed_memory_in_bytes=32000000
    )
    result = python.execute("3 + 5")
I've not yet figured out the incantations I need to actually do this - in particular the limits on CPU and memory time.

I posed this question on Mastodon recently and Jim Kring put together this demo, which gets most of the way there (albeit using an old Python 3.6 build): https://github.com/jimkring/python-sandbox-wasm

It doesn't feel like this should be as hard to figure out as it is!

10 comments

Wasmtime's `wasmtime-py` embedding in python has support for Wasm Components: https://github.com/bytecodealliance/wasmtime-py#components (disclosure, I helped create it)

The remaining piece of the puzzle would be to create a wit-bindgen guest generator https://github.com/bytecodealliance/wit-bindgen#guests for this build of the python interpreter. You could then seamlessly call back and forth between the host and guest pythons, without even knowing that wasmtime is under the hood.

If you could provide example code for how to do this - how to run a snippet of untrusted Python code using wasmtime-py with a CPU and RAM limit - I would shout it from the rooftops. I think a LOT of people would benefit from clear examples of how to actually achieve this.
The wit-bindgen work required would be a significant undertaking (a week? more?) by someone who already has some expertise in wit & python. Maybe the wasmlabs folks are up for taking it on.

In general the Wasm Component ecosystem is still a few months away from being generally useful. There are a lot of people across the bytecode alliance working on the fundamentals right now, and we are making great progress, but its not ready to ship quite yet.

Yes! Very much so.

Just tried this and it works great!

I changed app.py to this:

    import sqlite3
    print(sqlite3.connect(":memory:").execute(
        "select sqlite_version()"
    ).fetchone()[0])
And it output "3.39.2" - but the same code in my regular Python interpreter output "3.40.1", which demonstrates that the WASM Python there has its own WASM-compiled SQLite.
Great! I still need to look into how to limit memory consumption. Fuel works well enough for now, but there might be an option to limit execution by time, not just instructions.
Have you seen Extism? We call it a "plugin system", but it's far more generic than that.. just hard to market a "general purpose universal code runtime".

Python SDK: https://extism.org/docs/integrate-into-your-codebase/python-...

Python is 1 of 15+ languages we support, and as far as I know it's the easiest way to setup a wasm engine in your app, load wasm code, and call a function using complex data I/O.

I'm one of the authors - happy to share more about it, or if you'd like to join our Discord we have lots of active users and contributors there: https://discord.gg/cx3usBCWnc

I hadn't looked at that one. It looks very promising! Definitely has better documentation than the other options I've looked at.

I see that the library itself is available in Python, but this page https://extism.org/docs/category/write-a-plug-in currently only lists Rust, JavaScript, Go, Haskell, AssemblyScript, Zig and C. Any chance Python might get added to that list as well?

Just found https://github.com/extism/extism-sqlite3 which is very relevant to my interests too!

Extism does not yet support writing the plugins (our term for Wasm guests) in python. I have been working on javascript support[1] and I think I could use a similar strategy to support python. If you are at all interested in following along or helping out you should join our discord.[2] I just started a #python-pdk channel there.

[1] https://github.com/extism/js-pdk [2] https://discord.gg/mAADpt9r

Oh, and here's an example (test) showing how to construct a "manifest" to control the cpu/memory limits: https://github.com/extism/extism/blob/main/python/tests/test...

cpu is really controlled by # milliseconds until the wasm code is trapped.

That looks like exactly the level of control I've been hoping for in terms of sandboxing. This is really promising!
FWIW although it's not WebAssembly based, you can do that with GraalVM. It has a concept of language contexts which can be sandboxed including those constraints. There are two caveats:

1. Sandboxing for CPU time and max allowed memory requires the enterprise edition, so you'd have to pay for it.

2. The Python engine isn't 100% compatible with regular Python, although that may not matter for your use case as the compatibility is pretty good and issues mostly show up around extension modules.

Unfortunately there are at least two more major caveats:

1. Capability control only works for JavaScript (https://www.graalvm.org/latest/reference-manual/embed-langua...)

2. The documentation says in no uncertain terms that running untrusted code is unsupported (https://www.graalvm.org/latest/security-guide/#security-mode...)

The startup I'm working at is basically trying to do exactly that as a service, but a one-off thing for a regular Python application shouldn't be as hard to figure out as it is. Can you link to the Mastodon thread (darn lack of search!) and we can continue there?
Here's the Mastodon conversation: https://fedi.simonwillison.net/@simon/109682777068881522

(I'm so close to building my own search engine just against my own content there.)

I have Covid and it's late in the UK, but tried poking at wasmtime-py for the first time and got this far: https://gist.github.com/callahad/81b33e4e4456e4b27a5934c1a36...

If anyone else is awake and wants to pick up the baton...

- CPU limits are just using an arbitrary amount of wasmtime "fuel". Would be worth looking into epoch interrupts instead.

- Memory limits aren't implemented. Seems like wasmtime-py doesn't expose bindings to anything with a wasmtime::ResourceLimiter trait.

- All i/o is going through tempfiles instead of dealing with proper interfaces.

Why do this on the client? Why not pass it to the server and run it on Python there?
That's what I'm talking about: I want to run Python code on my server, but since it's from an untrusted source I want to make sure that it's in a sandbox with strict limits on what it can do, how much CPU it can use and how much RAM it has available to it - so malicious code can't be used to crash my server or steal data it shouldn't have access to.
How do you think WASM will solve this where everything else has failed?
Because WASM supports capability based security and thus never trusts code to do the right thing. Unlike every single one of it's predecessors.
WASM does not support capability based security that is something of an extension to the WASI proposal.

And even then the security stuff is based on previously existing work in the cloud space. Which has existed for some time but is not widespread.

Java? .NET?
Depending on what your inputs and outputs look like, perhaps you can spawn the Python interpreter in a subprocess that's sandboxed with seccomp and/or setrlimit?
I have not been able to figure out how to do that in the past. I think a solution using those would be restricted to Linux, and I want something that also works on macOS and maybe even Windows too.
Run it in a docker container?
This would be great. And with an exposeable API for safety a memory safe API that could be exposed to wasm applications. And rate limited.
https://nsjail.dev is a common tool for sandboxing
What kind of Python types do you envision needing to pass across the native/wasm boundary?
Have you tried to do it with pyodide? What issues did you hit using that?
Pyodide isn't currently supported outside of browsers, though that might change: https://github.com/pyodide/pyodide/issues/869

Either way, I couldn't figure out how to do the above sequence of steps with any of the available Python WASM runtimes - they're all very under-documented at the moment, sadly. I tried all three of these:

- https://github.com/wasmerio/wasmer-python

- https://github.com/bytecodealliance/wasmtime-py

- https://github.com/wasm3/pywasm3