Hacker News new | ask | show | jobs
by aapoalas 809 days ago
This is not strictly true. If you took a magic Rust wand to the whole V8 codebase that converted it into exactly the equivalent Rust code, Rust would definitely not help you one bit. But the code would also be filled with unsafe blocks. Many commenters have been saying that this is because it must be, otherwise the engine couldn't be written. For the JIT side of things I definitely agree. But as for the runtime's static implementations (those written in C++/Rust) it is entirely possible to write the runtime in safe Rust which definitely would catch eg. the simple example error they use in the blog post.

One option would be to use an existing gc crate: It checks at runtime that Rust's borrowing rules are upheld, so it would abort the program instead of doing the OoB access. This is of course not nice but it is memory safe. Obviously this would also make the engine less performant as now we're doing extra work on every read and write into a GC object.

Another option is to let go of our idea about GC objects holding mutable pointer access to one another. V8 already uses offsets from a 4 GB offset to find the item; these are "compressed pointers" because they always know they need to just upshift a bit to get the actual pointer, and because C++ doesn't care about multiple objects holding pointers to one another. A Rusty alternative to this would be that the "compressed pointers" are considered only 32-bit offsets from some base pointer that is held by the Isolate: Now Rust would not allow these to be actual pointers or become references. Instead you'd need to implement some API at the Isolate that gives you a reference to an item based on that offset, and the reference's lifetime is determined by your Isolate's lifetime.

Now, any call to JavaScript is always (in a theoretical sense) capable of mutating anything within the Isolate's heap. This means that calling a JS function would require an exclusive `&mut Isolate` reference: This now means that Rust understands that it cannot hold a reference to the Array's buffer (which it got a reference to from the Isolate heap) during a call to a JS function.

This sort of API would internally need some unsafe because it is a pointer offset we're doing here. If you don't like that, you can also go a step further and build your heap as a collection of vectors and then have your "pointers" be a combination of a type key and an index an index into that type's vector. With this sort of heap structure there is no unsafe usage needed as getting a reference is just borrowing from a Vec within the heap.

These "Rusty" ways of building the heap offer some interesting philosophical thoughts to ponder: The V8 way is kind of a faithful structuring of the ECMAScript concept of "object ownership" in C++: Items refer to each other directly, and can even do it recursively. Ownership of the memory is just kind of... there. It's obvious, right? This object refers to that, so it owns it. Except maybe if they're recursive and not accessible elsewhere... I mean, just don't think about it! (Unless you're building the GC algorithm.) The safe Rust heap structures make the memory ownership quite concrete: The Isolate owns all of its GC objects / everything in its heap. GC objects and references between them are NOT actual memory ownership relations! This JS object referring to that JS object means nothing to the memory ownership. A JS object exists as long as the Isolate thinks it exists, even if no other object refers to it.

So, the C++ / unsafe Rust way to write the Heap tries to unify JS ownership logic and the host language ownership logic into one. When these two don't agree, bugs can happen (like in the simple example provided: JS Array semantics written naively in C++ cause issues because C++ has different expectations). The safe Rust way instead throws the JS ownership logic out of the host language entirely and forces the engine to implement that runtime ownership logic on its own.

Source: I am writing a JS engine in (safe) Rust using the "heap is a collection of vectors of Ts" method.