It transliterates C to Rust all right, but the Rust isn't any safer than the C that goes in. Note the representation of an null-terminated string - it's an unsafe pointer to a byte. That's what it was in C, transliterated unsafely to Rust. Some safe Rust representation for C arrays is needed.
From the description of how it translates a FOR loop, it does so by compiling it down to the primitive operations and tests. A Rust FOR loop does not emerge. That needs idiom recognition for the common cases including, at least, "for (i=0; i<n; i++) {...}".
This is a big job, but it's good someone started on it.
A Rust module that exactly captures the semantics of a C source file is a Rust module that doesn't look very much like Rust. ;-) I would like to build a companion tool which rewrites parts of a valid Rust program in ways that have the same result but make use of Rust idioms. I think it should be separate from this tool because I expect it to be useful for other folks, not just users of Corrode. I propose to call that program "idiomatic", and I think it should be written in Rust using the Rust AST from syntex_syntax.
Not to look a gift horse in the mouth, but it seems like Corrode misses some other chances to use idiomatic Rust:
1. Rust fn:main doesn't need to return something.
2. The arguments to main aren't mutated, so Rust doesn't need to declare them as mutable.
3. Ditto for the argument to printf.
Anyone know how easy it is to recognize and code for such cases in the transpiler?
Edit: It looks like they might have opposite design goals [1]: "Corrode aims to produce Rust source code which behaves exactly the same way that the original C source behaved, if the input is free of undefined and implementation-defined behavior. ... If a programmer went to the trouble to put something in, I want it in the translated output; if it's not necessary, we can let the Rust compiler warn about it." (Edit2: cleaned up and numbered)
I think that keeping an exact one-to-one mapping makes this tool a lot more useful. There's no telling what code depends on C idioms that would be broken by using a Rust idiom instead. Generating 100% equivalent code means that programmers can make intelligent decisions about when to switch over to Rust idioms as they continue developing the program.
Yeah, once you've got equivalent Rust, the rest is just optimization that should probably be implemented in the Rust compiler. No reason to put that stuff in the niche transpiler.
> Anyone know how easy it is to recognize and code for such cases in the transpiler? Edit: It looks like they might have opposite design goals
Yes the author has explicitly noted that they want a compiler as syntax-directed as possible, semantics change would go against that grain. In that spirit, idiomatic alterations would be the domain of rust-land fixers and linters (e.g. `cargo wololo` or `cargo clippy | rustfix`)
So you could chain Corrode with one of those to get a C-to-idiomatic-Rust converter?
FWIW, I googled those; Clippy and rustfix just seemed to be linters that can't detect things like "you're not mutating this so drop `mut`", and I couldn't find wololo.
No, most real-world C code will expect a C `int` to be 32 bits, while `isize` is often 64 bits.
On the other hand, at least for Unix systems `long` is often equivalent to Rust's `isize`: 32 bits for 32-bit architectures, and 64 bits for 64-bit architectures, so it would make sense to convert `long` to `isize`.
They're different types. isize is ssize_t (well, intptr_t), in that it is tied to the size of the address space, while C's int is not constrained. In fact, it is usually 32 bits, even on 64-bit architectures, where isize is 64 bits.
Wow. So I did some sleuthing and apparently in Rust the maximum size of an object must fit in isize, not usize. That means on 32-bit architectures you can't have arrays larger than 2GB, whereas on Linux and similar systems 32-bit processes have access to 3GBs and even the full 4GBs of address space. It actually matters for things like mmap'ing files.
Technically, C's int is constrained. C defines a minimum range of values for all the datatypes. The minimum range for int is -32767 to +32767. long is -2147483647 to +2147483647. Though the discerning pendant will claim, ex post, to target something like POSIX (which increases the bound on int, defines char as 8 bits, etc) if you point out improper use of int.
One irony of criticisms against C is that people argue it's too low level, but that's often because people treat it as too low-level. For example, novice C programmers think of C integer types in terms of bit representations and infer value ranges. Good C programmers think of C integer types in terms of representable values, understand that bit representation (specifically, hardware representation) is almost always irrelevant, and understand how to leverage the unspecified upper bounds on value ranges to improve the longevity and portability of their software.
Languages which emphasize fixed-width integers are, in some sense, a retrogression. The real problem with C integer types is you won't see the folly in poor assumptions until it's too late. Languages like Ada addressed this with explicit ranges. But I guess that was too burdensome. Fixed-width integers is an appeasement of lazy programming. I admit to being lazy and using fixed-width integers in C more than I should, but at least I feel dirty about it.
Many of the compromises Rust makes are clearly informed by the _particular_ experiences of the core team. For example, the fact that most Rust developers are of the belief that malloc failure is not recoverable (a big hold-up in adding catch_unwind) is a reflection of their experience with large desktop software. Desktop software has very complex, interdependent, and less fine-grained transaction-oriented state. Recovering from malloc failure is very hard and of little benefit. Most server software, by contrast, has more natural and consistent transactional characteristics. Logical tasks have less interdependent state, so it's both easier and more beneficial to be able to recover from malloc failure.
I think some of the choices wrt integer types is similarly informed.
> the fact that most Rust developers are of the belief that malloc failure is not recoverable
This is untrue. The true statement is similar, but has different implications -- malloc failure is usually not recoverable, and nonrecoverable malloc failure should be the default, for the problem space Rust targets (which encompasses more than low-level things). You can recover from malloc in Rust, it just requires some extra work.
> Because the project is still in its early phases, it is not yet
> possible to translate most real C programs or libraries.
It is currently trying to port over semantics exactly, so the Rust code is far from idiomatic Rust. Doesn't mean it's not useful, just saying that it's trying to be 1:1.
I guess the next stage would involve translating common non-idiomatic patterns into idiomatic Rust. Looks like this could be a job for a community-managed database!
On the rust subreddit someone tongue-in-cheek suggested `cargo clippy | rustfix` to be used in conjunction with this tool for better rust code.
But that actually could work! Clippy has a ton of lints that make your code more idiomatic, and rustfix basically takes diagnostic output and applies suggestions (still WIP).
Clippy is geared towards making human-written unidiomatic code better, so it might not catch some silly things in this tool's output but or certainly could be extended to do that.
It tells you about places where you can improve your code. Possible pitfalls, style issues, documentation issues, unidiomatic code, everything.
Its a developer tool so you can use rustup to switch to nightly to run clippy (and use stable otherwise) and not impose nightly on the rest of the people who use the project. We have plans for making clippy a tool that you can fetch via rustup without requiring nightly.
This is best handled on a per-project or per-organization basis. I would have such a project concentrate on the tooling for maintaining and developing such databases.
[1] https://hackage.haskell.org/package/language-c