Hacker News new | ask | show | jobs
by smolder 1665 days ago
> FFI is inherently memory-unsafe

Maybe this specific problem needs attention. I wonder, is there a way we can make FFI safer while minimizing overhead? It'd be nice if an OS or userspace program could somehow verify or guarantee the soundness of function calls without doing it every time.

If we moved to a model where everything was compiled AOT or JIT locally, couldn't that local system determine soundness from the code, provided we use things like Rust or languages with automatic memory management?

2 comments

Presumably as long as FFIs are based on C calling conventions and running native instructions it would be unsafe. You could imagine cleaner FFIs that have significant restrictions placed on them (I'd imagine sandboxing would be required) but if the goal is to have it operate with as little overhead as possible, then the current regime is basically what you end up with and it would be decidedly unsafe.
This is a really hard problem because you have to discard the magic wand of a compiler and look at what is really happening under the hood.

At its most rudimentary level, a "memory safe" program is one that does not access memory that is forbidden to it at any point during execution. Memory safety can be achieved using managed languages or subsets[1] of languages like Rust - but that only works if the language implementations have total knowledge of memory accesses that the program may perform.

The trouble with FFIs is that by definition, the language implementations cannot know anything about memory accesses on the other side of the interface - it is foreign, a black box. The interface/ABI does not provide details about who is responsible for managing this memory, whether it is mutable or not, if it is safe to be reused in different threads, indeed even what the memory "points to."

On top of that, most of the time it's on the programmer to express to the implementation what the FFI even is. It's possible to get it wrong without actually breaking the program. You can do things like ignore signedness and discard mutability restrictions on pointers with ease in an FFI binding. There's nothing a language can do to prevent that, since the foreign code is a black box.

Now there are some exceptions to this, the most common probably being COM interfaces defined by IDL files, which are language agnostic and slightly safer than raw FFI over C functions. In this model, languages can provide some guarantees over what is happening across FFI and which operations are allowed (namely, reference counting semantics).

The way around all of this is simple : don't share memory between programs in different languages. Serialize to a data structure, call the foreign code in an isolated process, and only allow primitive data (like file descriptors) across FFI bounds.

This places enormous burden on language implementers, fwiw, which is probably why no one does it. FFI is too useful, and it's simple to tell a programmer not to screw up than reimplement POSIX in your language's core subroutines.

[1] "safe" rust, vs "unsafe" where memory unsafety is a promise upheld by the programmer, and violations may occur

Is there a way to do FFI without having languages directly mutate each other's memory, but still within the same process. So all 'communication' between the languages happens by serializing data, no shared memory being used for FFI. But you don't get the massive overhead of having to launch a second process.

You are still depending on the called function not clobbering over random memory. But if the called function is well-behaved you would have a clean FFI.

In practice you run all your process-isolated code in one process as a daemon instead of spawning it per call.

I think you're on the right track in boiling down the problem to "minimize the memory shared between languages and restrict the interface to well defined semantics around serialized data" - which in modern parlance is called a "remote procedure call" protocol (examples are gRPC and JSON RPC).

It's interesting to think about how one could do an RPC call without the "remote" part - keep it all in the same process. What you could do is have a program with an API like this :

    int rpc_call (int fd); 
where `fd` is the file descriptor (could even be an anonymous mmap) containing the serialized function call, and the caller gets the return value by reading the result back.

One tricky bit is thread safety, so you'd need a thread local file for the RPC i/o.