Hacker News new | ask | show | jobs
by duped 1665 days ago
This is a really hard problem because you have to discard the magic wand of a compiler and look at what is really happening under the hood.

At its most rudimentary level, a "memory safe" program is one that does not access memory that is forbidden to it at any point during execution. Memory safety can be achieved using managed languages or subsets[1] of languages like Rust - but that only works if the language implementations have total knowledge of memory accesses that the program may perform.

The trouble with FFIs is that by definition, the language implementations cannot know anything about memory accesses on the other side of the interface - it is foreign, a black box. The interface/ABI does not provide details about who is responsible for managing this memory, whether it is mutable or not, if it is safe to be reused in different threads, indeed even what the memory "points to."

On top of that, most of the time it's on the programmer to express to the implementation what the FFI even is. It's possible to get it wrong without actually breaking the program. You can do things like ignore signedness and discard mutability restrictions on pointers with ease in an FFI binding. There's nothing a language can do to prevent that, since the foreign code is a black box.

Now there are some exceptions to this, the most common probably being COM interfaces defined by IDL files, which are language agnostic and slightly safer than raw FFI over C functions. In this model, languages can provide some guarantees over what is happening across FFI and which operations are allowed (namely, reference counting semantics).

The way around all of this is simple : don't share memory between programs in different languages. Serialize to a data structure, call the foreign code in an isolated process, and only allow primitive data (like file descriptors) across FFI bounds.

This places enormous burden on language implementers, fwiw, which is probably why no one does it. FFI is too useful, and it's simple to tell a programmer not to screw up than reimplement POSIX in your language's core subroutines.

[1] "safe" rust, vs "unsafe" where memory unsafety is a promise upheld by the programmer, and violations may occur

1 comments

Is there a way to do FFI without having languages directly mutate each other's memory, but still within the same process. So all 'communication' between the languages happens by serializing data, no shared memory being used for FFI. But you don't get the massive overhead of having to launch a second process.

You are still depending on the called function not clobbering over random memory. But if the called function is well-behaved you would have a clean FFI.

In practice you run all your process-isolated code in one process as a daemon instead of spawning it per call.

I think you're on the right track in boiling down the problem to "minimize the memory shared between languages and restrict the interface to well defined semantics around serialized data" - which in modern parlance is called a "remote procedure call" protocol (examples are gRPC and JSON RPC).

It's interesting to think about how one could do an RPC call without the "remote" part - keep it all in the same process. What you could do is have a program with an API like this :

    int rpc_call (int fd); 
where `fd` is the file descriptor (could even be an anonymous mmap) containing the serialized function call, and the caller gets the return value by reading the result back.

One tricky bit is thread safety, so you'd need a thread local file for the RPC i/o.