Hacker News new | ask | show | jobs
by macgyverismo 508 days ago
I can see how the author came to the conclusion that libmodem written in rust would have prevented the issue, but isn't it simply pushing the problem further down the stack?

The author needed to use unsafe in order to pass his pointer to libmodem, but libmodem is going to require a pointer with static lifetime itself. Which would have prevented the issue in the first place had the author done this.

I can see why you wouldn't want to use static, it hinders testability, but that means you need to ensure that the pointer you supply libmodem outlives libmodem. I would use RAII to do that in C++ and I am sure in rust you could/would do the same.

I guess I am asking, is there anything here that a libmodem written in rust would have magically solved? It feels like wishful thinking, but I am open to learn where I am mistaken.

In any case, kudos for finding this bug. Having worked with Zephyr/NRF connect SDK and this exact chip myself I can definitely relate to the pain they (can) bring.

4 comments

The existing C interface doesn't have means to describe the lifetime of the data being passed in. It just takes a pointer. An experienced C programmer would often understand what's happening by convention and not encounter the problem.

But the custom Rust wrapper was composed as a game of telephone (ugh), with the author blindly mimicking "Jonathan" who seemed to have been blindly mimicking a sloppy (and later repaired) example from Nordic.

The argument is that if the library and its internals were originally written in Rust, which has richer semantics for object lifetimes, Rust would have been able to formally convey that the input data needed to outlive the individual function call, throwing an error at compile time.

The wrapper could have enforced this constraint itself, as it probably does now, but the handoff between Rust and C needs somebody to account for and understand the by convention stuff in C so that it can be expressed formally in Rust, and that human process failed to happen here.

Part of the issue is that there's not really a convention in C. If it's not documented, you should probably read the source code to find out. (C programmers often think there's a convention, but that's because there's one option that's obvious to them but then other programmers will have a different 'obvious' option, which is why this is so often not documented at all)
If I read the article correctly, Nordic changed the rules on this function without saying anything. It used to work with a stack-allocated config and now it doesn’t. The only way for the caller to know about that in C is a comment.
That's possible, but unlikely given conventions in embedded development and how something like this interface would generally need to work.

More likely (but not necessarily), Nordic's early example was either bugged or conditionally valid (benefiting from other implicit details of their implementation) and then was revised either because the mistake was identified or something else about the example change.

That's all pretty common in this domain. Inadvertently stumbling because you uncritically followed some vendor example is also pretty common and completely understandable. Better tools, like using a language with richer semantics, are indeed something that can help with that.

I dunno, in my experience if you see something worded the way this was:

    int something_init(const something_init_params* params);
the convention is that the params are temporary -- really just a way of passing a bunch of parameters to the init function. It would be a surprise that the params are expected to be static. E.g., the whole STM32 HAL is done this way, and it would be a disaster if you thought the init structs all had to be static!

BTW, you can see the assumptions of the non-embedded programmers talking about "taking ownership" being the default interpretation of a signature like that...if you don't have a heap, what does that even mean?

In any case, C is a mess, embedded is a mess, no argument there!

Making a copy is much less common in embedded if it's a large data structure.
Not to mention that one of the things experienced people learn is that vendor code is hot flaming garbage and must never be trusted. Writing a Rust implementation based on vendor code is like building a skyscraper on a landfill. Don't do that. If you have to do that, tread bloody carefully!

I am more on the hardware side these days, but Nordic's hardware docs are pretty crap. As in, they're pretty, and they're crap. (The prettiness lulls people, especially managers, into a false sense of confidence. Don't fall for the trap!) There are obvious poor choices in there, and if you call FAEs out on them, they say to just follow the docs. Experienced engineers should not follow the docs.

I can't see their software side being any better.

> I guess I am asking, is there anything here that a libmodem written in rust would have magically solved?

I'm not following your comment, but I think the point is simply "the lifetime of the config is in the function signature, rather than hopefully (sometimes) being in the documentation, and hopefully (sometimes) correct".

It sounds like one function in libmodem accepts a pointer to a configuration struct, then stores that pointer (or an interior pointer from within it), which is then later used by another libmodem function later. If all of libmodem were written in Rust, this could be done without any use of unsafe, but it would require the lifetime on the original "reference" to provably outlive the second function getting called, probably by being static.
The author mentioned in the first chapter that everything works fine in rust, since it solves all problems. So I guess they throw "better in rust" against every problem.

The assumption nobody ever makes mistakes is mistake one.