Hacker News new | ask | show | jobs
by DaGardner 3636 days ago
as many "transpilers" / compilers, whatever you might name them, it lacks example input output.

I want to see how my new rust code base looks light, does it compile with some heuritics, or just 1:1 C to rust primitives?

2 comments

Here you go. It didn't like my stdio.h. Apparently enums and unions aren't supported, but:

    extern int printf(char *, ...);

    int main(int argc, char argv[]) {
        printf("Hello, world!\n");
        return 0;
    }
Was turned into:

    extern {
        fn printf(arg1 : *mut u8, ...) -> i32;
    }
    #[no_mangle]
    pub unsafe fn main(mut argc : i32, mut argv : *mut u8) -> i32 {
        printf(b"Hello, world!\n\0".as_ptr() as (*mut u8));
        0i32
    }
edit: Also worth noting, it removes all comments. I believe this to be a limitation of language-c [1]

[1] https://hackage.haskell.org/package/language-c

It transliterates C to Rust all right, but the Rust isn't any safer than the C that goes in. Note the representation of an null-terminated string - it's an unsafe pointer to a byte. That's what it was in C, transliterated unsafely to Rust. Some safe Rust representation for C arrays is needed.

From the description of how it translates a FOR loop, it does so by compiling it down to the primitive operations and tests. A Rust FOR loop does not emerge. That needs idiom recognition for the common cases including, at least, "for (i=0; i<n; i++) {...}".

This is a big job, but it's good someone started on it.

This is explained in the readme:

A Rust module that exactly captures the semantics of a C source file is a Rust module that doesn't look very much like Rust. ;-) I would like to build a companion tool which rewrites parts of a valid Rust program in ways that have the same result but make use of Rust idioms. I think it should be separate from this tool because I expect it to be useful for other folks, not just users of Corrode. I propose to call that program "idiomatic", and I think it should be written in Rust using the Rust AST from syntex_syntax.

couldn't that be a more general 2nd pass. rust -> rust?
Shouldn't a C `int` be converted to Rust's `isize`. I think that captures the spirit better.
Not to look a gift horse in the mouth, but it seems like Corrode misses some other chances to use idiomatic Rust:

1. Rust fn:main doesn't need to return something.

2. The arguments to main aren't mutated, so Rust doesn't need to declare them as mutable.

3. Ditto for the argument to printf.

Anyone know how easy it is to recognize and code for such cases in the transpiler?

Edit: It looks like they might have opposite design goals [1]: "Corrode aims to produce Rust source code which behaves exactly the same way that the original C source behaved, if the input is free of undefined and implementation-defined behavior. ... If a programmer went to the trouble to put something in, I want it in the translated output; if it's not necessary, we can let the Rust compiler warn about it." (Edit2: cleaned up and numbered)

[1] https://github.com/jameysharp/corrode#design-principles

I think that keeping an exact one-to-one mapping makes this tool a lot more useful. There's no telling what code depends on C idioms that would be broken by using a Rust idiom instead. Generating 100% equivalent code means that programmers can make intelligent decisions about when to switch over to Rust idioms as they continue developing the program.
Yeah, once you've got equivalent Rust, the rest is just optimization that should probably be implemented in the Rust compiler. No reason to put that stuff in the niche transpiler.
> Anyone know how easy it is to recognize and code for such cases in the transpiler? Edit: It looks like they might have opposite design goals

Yes the author has explicitly noted that they want a compiler as syntax-directed as possible, semantics change would go against that grain. In that spirit, idiomatic alterations would be the domain of rust-land fixers and linters (e.g. `cargo wololo` or `cargo clippy | rustfix`)

So you could chain Corrode with one of those to get a C-to-idiomatic-Rust converter?

FWIW, I googled those; Clippy and rustfix just seemed to be linters that can't detect things like "you're not mutating this so drop `mut`", and I couldn't find wololo.

1. A special case could be added for `main`, but it's no big deal.

2. This seems difficult as the C arguments were mutable; the algorithm would have to start doing analysis rather than direct translation.

3. Quite difficult to "know" that this printf doesn't write to its arguments, especially since the printf is manually declared.

Regarding 1., If you're still reading, it looks like they discuss what they'd have to do to move `main` to its correct Rust type:

https://github.com/jameysharp/corrode/issues/20

No, most real-world C code will expect a C `int` to be 32 bits, while `isize` is often 64 bits.

On the other hand, at least for Unix systems `long` is often equivalent to Rust's `isize`: 32 bits for 32-bit architectures, and 64 bits for 64-bit architectures, so it would make sense to convert `long` to `isize`.

They're different types. isize is ssize_t (well, intptr_t), in that it is tied to the size of the address space, while C's int is not constrained. In fact, it is usually 32 bits, even on 64-bit architectures, where isize is 64 bits.
Wow. So I did some sleuthing and apparently in Rust the maximum size of an object must fit in isize, not usize. That means on 32-bit architectures you can't have arrays larger than 2GB, whereas on Linux and similar systems 32-bit processes have access to 3GBs and even the full 4GBs of address space. It actually matters for things like mmap'ing files.

Technically, C's int is constrained. C defines a minimum range of values for all the datatypes. The minimum range for int is -32767 to +32767. long is -2147483647 to +2147483647. Though the discerning pendant will claim, ex post, to target something like POSIX (which increases the bound on int, defines char as 8 bits, etc) if you point out improper use of int.

One irony of criticisms against C is that people argue it's too low level, but that's often because people treat it as too low-level. For example, novice C programmers think of C integer types in terms of bit representations and infer value ranges. Good C programmers think of C integer types in terms of representable values, understand that bit representation (specifically, hardware representation) is almost always irrelevant, and understand how to leverage the unspecified upper bounds on value ranges to improve the longevity and portability of their software.

Languages which emphasize fixed-width integers are, in some sense, a retrogression. The real problem with C integer types is you won't see the folly in poor assumptions until it's too late. Languages like Ada addressed this with explicit ranges. But I guess that was too burdensome. Fixed-width integers is an appeasement of lazy programming. I admit to being lazy and using fixed-width integers in C more than I should, but at least I feel dirty about it.

Many of the compromises Rust makes are clearly informed by the _particular_ experiences of the core team. For example, the fact that most Rust developers are of the belief that malloc failure is not recoverable (a big hold-up in adding catch_unwind) is a reflection of their experience with large desktop software. Desktop software has very complex, interdependent, and less fine-grained transaction-oriented state. Recovering from malloc failure is very hard and of little benefit. Most server software, by contrast, has more natural and consistent transactional characteristics. Logical tasks have less interdependent state, so it's both easier and more beneficial to be able to recover from malloc failure.

I think some of the choices wrt integer types is similarly informed.

> the fact that most Rust developers are of the belief that malloc failure is not recoverable

This is untrue. The true statement is similar, but has different implications -- malloc failure is usually not recoverable, and nonrecoverable malloc failure should be the default, for the problem space Rust targets (which encompasses more than low-level things). You can recover from malloc in Rust, it just requires some extra work.

I'm in no way connected to the project. Perhaps you should file an issue.
Only for ILP64 ABIs, which aren't common.
On some architecture int is 32bit while isize is actually 64bit so no, that translation is definitely not the ideal one.

  > Because the project is still in its early phases, it is not yet
  > possible to translate most real C programs or libraries.
It is currently trying to port over semantics exactly, so the Rust code is far from idiomatic Rust. Doesn't mean it's not useful, just saying that it's trying to be 1:1.
I guess the next stage would involve translating common non-idiomatic patterns into idiomatic Rust. Looks like this could be a job for a community-managed database!
On the rust subreddit someone tongue-in-cheek suggested `cargo clippy | rustfix` to be used in conjunction with this tool for better rust code.

But that actually could work! Clippy has a ton of lints that make your code more idiomatic, and rustfix basically takes diagnostic output and applies suggestions (still WIP).

Clippy is geared towards making human-written unidiomatic code better, so it might not catch some silly things in this tool's output but or certainly could be extended to do that.

I haven't used nightly much, what all does Clippy do?
It tells you about places where you can improve your code. Possible pitfalls, style issues, documentation issues, unidiomatic code, everything.

Its a developer tool so you can use rustup to switch to nightly to run clippy (and use stable otherwise) and not impose nightly on the rest of the people who use the project. We have plans for making clippy a tool that you can fetch via rustup without requiring nightly.

Check out Clippy online! Go do http://play.integer32.com/, paste in your code, click "Clippy".
It's a linter.
This is best handled on a per-project or per-organization basis. I would have such a project concentrate on the tooling for maintaining and developing such databases.