Hacker News new | ask | show | jobs
by Animats 3075 days ago
Rust's approach to "unsafe" is to let the programmer do whatever they want. Having to use this for UNIX-type API calls is kind of lame.

I once proposed extending C to allow talking about array sizes.[1] You'd define "read" as

    int read(int fd, char &buf[len], size_t len);
The compiler now knows that "buf" is an array with length "len", and can check calls for "buf" being the right size. The generated code for the call is the same; this doesn't require array descriptors. It just says which parameter defines the length of the array.

All the original UNIX calls and most of the Linux ones fit into that simple model. If the size of something is hard to define simply at an API call, the API has a problem.

Rust's system for external C calls should be more like that and less about casts to raw pointers. It's technically possible to fix this in C, and have a "strict mode", but the political problems are too hard.

[1] http://www.animats.com/papers/languages/safearraysforc43.pdf

3 comments

> Rust's system for external C calls should be more like that and less about casts to raw pointers.

It seems a rosy-eyed view to think that this would helping safety significantly, and would require a lot of effort: it's likely to be much lower pay-off than other things, like investing in, say, sanitizers or even just doing the work of writing safe wrappers for popular C libs, removing C FFI concerns from most people, who can just use the Rust library.

Specifically, as you say, C doesn't have this information, meaning there's no way for Rust's (or another language's) FFI to work like this automatically. Instead, someone will have to annotate the C code, have some extra "notes" layer, or annotate the imported Rust declarations. Either way, there's a human element, meaning a place for mistakes to be made. It seems like the less-duplicative way to do this is to make Rust wrappers that take Rust slices, since these will be wanted in the end anyway.

Of course you want to use Rust slices. Those map directly to the kind of C array I outlined. If you could declare a C API that way to Rust, you'd get the mapping without talking about pointers explicitly at all.

What I'm arguing for is a declarative way to talk about C interfaces that is consistent with Rust's model. This is better than using "unsafe" to construct C-type raw pointers. Yes, this is more restrictive and there will be some awful C APIs you can't describe. That's a good indication said C API is trouble.

What would make this "declarative way to talk about C interfaces" less error prone than something like this?

    extern fn read(fd: c_int, buf: *mut c_char, len: usize) -> isize;

    pub fn read(fd: c_int, buf: &mut [c_char]) -> isize {
        unsafe { read(fd, buf.as_mut(), buf.len()) }
    }
Further, note that this is insufficient for an idiomatic Rust API. You would also want to wrap the file descriptor (perhaps not for all C APIs) and the return value (definitely applies to all C APIs). So it would really look more like this:

    pub struct File { fd: c_int }

    impl File {
        pub fn read(&self, buf: &mut [u8]) -> Result<usize, ReadError> {
            let r = unsafe { read(self.fd, buf.as_mut(), buf.len()) };
            if r == -1 {
                Err(ReadError::from(errno))
            } else {
                Ok(r as usize)
            }
        }
    }
I can certainly imagine a way to do that declaratively, but not in a way that helps even this most basic of examples. (Also, note that constructing raw pointers is completely safe- `as_mut` for example.)
That's not bad. It would be useful to be able to use some kind of "C slice" in an extern fn declaration, so you could talk about arrays, rather than pointers. Same function call code, but more Rust-line syntax. Then you don't need unsafe imperative code at all.

This would put all the memory-risky stuff in declarations of external functions.

> I once proposed extending C to allow talking about array sizes.[1]

That would be a very useful, and relatively unobstrusive, extension to C. I've always liked the idea of a C "strict mode". I wish the political problems weren't so hard.

That's for local variables. Microsoft and Linus Torvalds didn't like it, because it's a way to suddenly cause unexpected stack growth of arbitrary size. That feature was made optional in C++11, and Microsoft never implemented it.
FWIW Microsoft does have SAL annotations to do the same thing. For example fread's prototype is

    size_t fread(
        _Out_writes_bytes_(_ElementSize*_Count) void * _DstBuf,
        _In_ size_t _ElementSize,
        _In_ size_t _Count,
        _Inout_ FILE * _File
    );
https://docs.microsoft.com/en-us/visualstudio/code-quality/a...
C++ compilers also have references to arrays which can be abused in some cases:

    template < size_t len > int read(int fd, char (&buf)[len]); // array size will be infered
    int read(int fd, char (&buf)[1024]); // array size must be exactly 1024