Hacker News new | ask | show | jobs
by tschneidereit 2416 days ago
What you're describing will indeed be introduced with the WebAssembly GC proposal: https://github.com/WebAssembly/gc

For languages that can express unforgeable pointers as first-class concept, that is indeed a very attractive, fine-grained approach. Unfortunately bringing that to languages like C/C++/Rust is a different matter altogether.

Since we want to support those languages as first-class citizens, we can't require GC support as a base concept, so we have to treat a nanoprocess as the unit of isolation from the outside.

Once we have GC support, nothing will prevent languages that can use it from expressing finer-grained capabilities even within a nanoprocess, and that seems highly desirable indeed.

(full disclosure: I'm a Mozilla employee and one of the people who set up the Bytecode Alliance.)

4 comments

That future possibiilty reminds me of https://en.wikipedia.org/wiki/Singularity_(operating_system) - where process/address-space isolation was replaced with fine-grained static verification of high-level code (presumably not the first experiment in this area).
Indeed: that and many other things are prior art in this space. And there is a lot of prior art for what we're working on—this is not meant as an academic research project! :)
Yes, one of the answers I want to give any time someone asks "why will WASM succeed when the JVM didn't" is that there is 25 years more experience and research to draw upon.
And yet bounds checking access validation was left out of the design, something that most of previous research projects took care to taint as unsafe packages when present.
> For languages that can express unforgeable pointers as first-class concept, that is indeed a very attractive, fine-grained approach. Unfortunately bringing that to languages like C/C++/Rust is a different matter altogether.

The semantics of these languages aren’t incompatible with unforgeable references, though: it generally works in practice, but it’s technically undefined to create pointers out of thin air. Why can’t we take advantage of the standard here to disallow illegally created references? (Which, as I understand it, many other vendors are already beginning to do with e.g. pointer authentication and memory tagging.)

It's slow. You should read up on the challenges of implementing memcpy in C emulated on Java. Basically you have to manually implement paging.
What would allow other languages to represent unforgeable pointers as a first class concept and not C/C++/Rust?

Forging a pointer is UB in all of these languages as far as I know.

It seems like you should be able to have opaque types that represent these unforgeable pointers which you can't do arithmetic on or cast to raw pointers, but can access values in type safe ways, or provide a view to a byte slice which does bounds check on access.

Is there a good place for discussion of this design? I seem to be having this conversation with you and Josh both here and on Reddit, and it seems like a lot of the discussion is spread out in a lot of places.

In unsafe rust you can arbitrarily increase the length of a vector/string by modifying the stored length. You do not need to forge the pointer itself to break the pointer's invariant.
You would need to do either static or dynamic bounds checking when accessing memory via these capabilities. You obviously can't just give arbitrary code a pointer and let it read however far it wants past the end of it.

Given that most code in Rust is safe code and includes bounds checks before access, you should be able to have the verifier rely on those when they exist, and add in bounds checks in cases in which the access is not protected by a bounds check.

Maybe that would be intractable, or to inefficient to be worth it with all of the extra bounds checks. I'm not sure. I'm asking because it's something that I feel should be possible, but I haven't been involved in the research or development, so I'm wondering if those who have been more involved have references to discussion about the topic.

>Since we want to support those languages as first-class citizens, we can't require GC support as a base concept

I feel like you're overthinking it. Can't you just have a table that holds GCable objects and only hand out indexes to C and co?

That is how we support references in the Rust toolchain right now, via wasm-bindgen, and it's an important part of making unforgable references work for languages that rely on linear memory.

It doesn't help with making capabilities more fine-grained, though: we have to treat all code that has access to that table as having the same level of trust.