Hacker News new | ask | show | jobs
by gwbas1c 700 days ago
Maybe they are asking the wrong questions?

Does Rust need to change to make it easier to call C?

I've done a bit of Rust, and (as a hobbyist,) it's still not clear (to me) how to interoperate with C. (I'm sure someone reading this has done it.) In contrast, in C++ and Objective C, all you need to do is include the right header and call the function. Swift lets you include Objective C files, and you can call C from them.

Maybe Rust as a language needs to bend a little in this case, instead of expecting the kernel developers to bend to the language?

7 comments

Calling C from Rust can be quite simple. You just declare the external function and call it. For example, straight out of the Rust book https://doc.rust-lang.org/book/ch19-01-unsafe-rust.html#usin... :

  extern "C" {
      fn abs(input: i32) -> i32;
  }

  fn main() {
      unsafe {
          println!("Absolute value of -3 according to C: {}", abs(-3));
      }
  }
Now, if you have a complex library and don't want to write all of the declarations by hand, you can use a tool like bindgen to automatically generate those extern declarations from a C header file: https://github.com/rust-lang/rust-bindgen

There's an argument to be made that something like bindgen could be included in Rust, not requiring a third party dependency and setting up build.rs to invoke it, but that's not really the issue at hand in this article.

The issue is not the low-level bindings, but higher level wrappers that are more idiomatic in Rust. There's no way you're going to be able to have a general tool that can automatically do that from arbitrary C code.

There's also cbindgen for going the other way around. https://github.com/mozilla/cbindgen
Passing integers around is easy, sharing structs or strings and context pointers for use in callbacks crossing the language barrier etc is typically much harder.
For rust code calling C, sharing structs is doable with #[repr(C)]. See https://doc.rust-lang.org/reference/type-layout.html#reprc-s...

(Nitpick: I don’t think it technically is correct to call this “The C representation”, as strict layout in C depends on the C compiler/ABI. I wouldn’t trust this to be good enough for serializing data between 32-bit and 64-bit systems, for example. For calling code on the same system, it’s good enough, though)

That's not really "simple", it's on par with C FFI in about any other language (except C++), with same drawbacks.
It's on par with C++, too. In C++ you need an `extern "C"`, because C++ linkage isn't guaranteed to be the same as C linkage. You can get away with wrapping that around it in a preprocessor conditional, but that's not all that much easier than Rust's bindgen.

A lot of C to C++ interop is actually done wrong without knowing it. Throwing a C++ static function as a callback into a C function usually works, but it's not technically correct because the linkage isn't guaranteed to be the same without an extern "C". In practice, it usually is the same, but this is implementation-defined, and C++ could use a different calling convention from C (e.g. cdecl vs fastcall vs stdcall. The Borland C++ compiler uses fastcall by default for C++ functions, which will make them illegal callbacks for C functions).

The major difference between Objective-C and C++'s C interop and other languages is the lack of the preprocessor. Macros will just work because they use the same preprocessor. That's really not easy to paper over in other languages that can't speak the C preprocessor.

I think you're confusing some terms here.

> In C++ you need an `extern "C"`, because C++ linkage isn't guaranteed to be the same as C linkage.

`extern "C"` has nothing to do with linkage, all it does is disable namemangling, so you get the same symbol name as with a C compiler.

> Throwing a C++ static function as a callback into a C function usually works, but it's not technically correct because the linkage isn't guaranteed to be the same without an extern "C".

Again, linkage is not relevant here. Your C++ callbacks don't have to be declared as extern "C" either, because the symbol name doesn't matter. As you noted correctly, the calling conventions must match, but in practice this only matters on x86 Windows. (One notable example is passing callbacks to Win32 API functions, which use `stdcall` by default.) Fortunately, x86_64 and ARM did away with this madness and only have a single calling convention (per platform).

> `extern "C"` has nothing to do with linkage, all it does is disable namemangling, so you get the same symbol name as with a C compiler.

extern "C" also ensures that the C calling convention is used, which is relevant for callbacks. It's not just name mangling. This is the reason that extern "C" static functions exist. You can actually overload a C++ function by extern "C" vs extern "C++", and it will dispatch it appropriately based on whether the passed in function is declared with C or C++ linkage.

And I'm not sure the terms are confused, because that's how most documentation refers to it: https://learn.microsoft.com/en-us/cpp/cpp/extern-cpp?view=ms...

> In C++, when used with a string, extern specifies that the linkage conventions of another language are being used for the declarator(s). C functions and data can be accessed only if they're previously declared as having C linkage. However, they must be defined in a separately compiled translation unit.

And https://en.cppreference.com/w/cpp/language/language_linkage

The post you're replying to had it completely right. extern "C" is entirely about linkage, which includes calling convention and name mangling.

> As you noted correctly, the calling conventions must match, but in practice this only matters on x86 Windows.

Or if you want your program to actually be correct, instead of just incidentally working for most common cases, including on future systems.

If you're passing a callback to a C function from C++, it's wrong unless the callback is declared extern "C".

> extern "C" also ensures that the C calling convention is used, which is relevant for callbacks. It's not just name mangling.

I stand corrected. I didn't know that `extern "C"` enforces the C calling convention.

However, on modern platforms this doesn't really matter because, as I said, there is only a single calling convention (per platform). And I'm pretty sure that future platforms will keep it that way. Fortunately, if you try to pass a C++ callback of the wrong calling convention, you get a compiler error.

> If you're passing a callback to a C function from C++, it's wrong unless the callback is declared extern "C".

That's certainly not true because `extern "C"` is not the only way to specify the calling convention. In fact, you might need a different calling convention! As I mentioned, on x86 the Windows API uses stdcall for all API functions and callbacks, so `extern "C"` would be wrong. If you look at the Microsoft examples, you will see that they declare the callbacks as WINAPI (without `extern "C"`): https://learn.microsoft.com/en-us/windows/win32/procthread/c...

So I stand by my point that in practice you don't need `extern "C"` for passing C++ callbacks to C functions. You can pass a lambda function just fine, and when it doesn't work the compiler will tell you.

How is that not simple? You just declare the function and then call it. I find it hard to imagine how it could be any more simple than that.
Now imagine a hundred or two functions, structures and callbacks, some of them exposed only as CPP macros over internal implementation. PJSIP low level API is one example.
But... that's what bindgen is for. Which I mentioned.

I said it "can be quite simple"; for simple use cases, just using extern and translating the declarations by hand is perfectly viable.

For more complex cases, you use bindgen.

Bindings generators exist in most other languages with same limitations.

I would love to see how bindgen would handle a function call defined as a preprocessor macro that I mentioned. Because most likely it won't.

Can someone shed some light on why the parent comment (by varjag) is downvoted?
... And? Most languages make C interop simple.
They quickly become unwieldy on non-trivial APIs, with hundreds of definitions across dozens of files and with macros to boot. Naturally people would still get the job done but it's beyond simple.
That's what bindgen is for, as was mentioned in the original comment you replied to.
How well does it handle preprocessor macros in APIs?
This is not a notable challenge in rust, nor relevant to the article.

The article is about finding ways of using rust to actually implement kernel fs drivers/etc. Note that any rust code in the kernel is necessarily consuming C interfaces.

Bindgen works quite well for the use case that you are thinking.

https://github.com/rust-lang/rust-bindgen

Yeah, the Rust proponents are being significantly more ambitious. Not just the ability to code a file system in Rust, but do it in a way that catches a lot of the correctness issues relating to the complex (and changing) semantics of FS development.
It's actually pretty easy. All you need is declare `extern "C" fn foo() -> T` to be able to call it from Rust, and to pass the link flags either by adding a #[link] attribute or by adding it in a build.rs.

You can use the bindgen crate to generate bindings ahead of time, or in a build.rs and include!() the generated bindings.

Normally what people do is create a `-sys` crate that contains only bindings, usually generated. Then their code can `use` the bindings from the sys crate as normal.

> in contrast, in C++ and Objective C, all you need to do is include the right header

and link against the library.

The point is that Rust can model invariants that C can't. You can call both ways, but if C is incapable of expressing what Rust can, that has important implications for the design of APIs which must be common to both.
That's not how I interpreted it: There is a clear need to be able to write filesystems in Rust, and the kernel developer(s) who write the filesystem API don't want to have to maintain the bindings to Rust.
They say this in almost every paragraph! For example, five of the first seven paragraphs:

> The first is to express more of the requirements using Rust's type system in order to catch more mistakes at compile time.

> Almeida showed an example of how the Rust type system can eliminate certain kinds of errors.

> … it was exactly that kind of discussion/argument that could be avoided by encapsulating the rules into the Rust types and abstractions; the compiler will know the right thing to do.

> … All of that is enforced through the type system.

> the whole idea is to determine what the constraints are from Viro and other filesystem developers, then to create types and abstractions that can enforce them.

More explicitly:

> The object lifecycles are being encoded into the Rust API, but there is no equivalent of that in C; if someone changes the lifecycle of the object on one side, the other will have bugs.

> As those changes occur, "we will find out whether or not this concept of encoding huge amounts of semantics into the type system is a good thing or a bad thing".

> In addition, when the C code changes, the Rust code needs to follow along, but who is going to do that work? Almeida agreed that it was something that needs to be discussed.

FWIW: I shipped a Windows file system driver in 2020. The api hadn't changed in years. Does Linux's API for kernel-space filesystems really change so rapidly that keeping the rust bindings up-to-date would be a considerable amount of work, in the long run?

> Does Rust need to change to make it easier to call C?

No, because it's already dirt-simple to do. You just declare the C function as 'extern "C"', and then call it. (You will often need to use 'unsafe' and convert or cast references to raw pointers, but that's simple syntax as well.)

There are tools (bindgen being the most used) that can scan C header files and produce the declarations for you, so you don't have to manually copy/paste and type them yourself.

> Maybe Rust as a language needs to bend a little in this case, instead of expecting the kernel developers to bend to the language?

I think you maybe misunderstood the article? There's nothing wrong with the language here. The argument is around how Rust should be used. The Rust-for-Linux developers want to encode semantics into their API calls, using Rust's features and type system, to make these calls safer and less error-prone to use. The people on the C side are afraid that doing so will make it harder for them to evolve the behavior and semantics of their C APIs, because then the Rust APIs will need to be updated as well, and they don't want to sign up for that work.

An alternative that might be more palatable is to not make use of Rust features and the type system in order to encode semantics into the Rust API. That way, it will be easier for C developers, as updating Rust API when C API changes will be mechanical and simple to do. But then we might wonder what the point is of all this Rust work if the Rust-for-Linux developers can't use Rust some features to make better, safer APIs.

> I've done a bit of Rust, and (as a hobbyist,) it's still not clear (to me) how to interoperate with C.

Kinda weird that you currently have the top-voted comment when you admit you don't understand the language well enough to have an informed opinion on the topic at hand.

I’ve written Rust code that called C++

It wasn’t completely straightforward, but on the whole I figured out everything I needed to within a few days in order to be able to do it.

Calling C would surely be very similar.

If you like to see some examples of C bindings:

https://github.com/tree-sitter/tree-sitter/blob/25c718918084...