Hacker News new | ask | show | jobs
by smasher164 36 days ago
What I don't understand is if they were going to translate Zig to unsafe Rust, why not just build a translation tool for it? You could do a one-to-one mapping of language constructs, hardcoding patterns in your codebase, and as one friend put it "Tbh they could've just hooked up zig translate-c to c2rust". They would get deterministic translation, would probably have not been a heavy investment to build, and the output would have the same assurances as the input.

In this case, I would trust the output even less than the input. The input was memory-unsafe but hand-written. The output is memory-unsafe but also vibe-coded and has had no eyeballs on it. What is the point of abusing agentic AI for this use-case?

8 comments

> "Tbh they could've just hooked up zig translate-c to c2rust".

Have you ever seen what comes out of c2rust? It's awful. It relies on a library of functions which emulate unsafe C pointer semantics with unsafe Rust.

A few years ago, when I was struggling with bugs in OpenJPEG (a JPEG 2000 decoder), someone tried running it through c2rust. The converted unsafe rust segfaulted at the same place the C code did. It's compatible, but not safe.

Main insight: don't do string manipulation in C or unsafe Rust. It's totally the wrong tool for the job.

> Have you ever seen what comes out of c2rust? It's awful. It relies on a library of functions which emulate unsafe C pointer semantics with unsafe Rust.

which is somewhat close to what their port produced...

like their goal was from the get to go to have a mostly exactly the same as zig "just in rust" which implies mostly unsafe rust and all the soundness/memory issues zig has (plus probably some more due to AI based port instead of a tool like c2ruts)

the thing is if you don't keep things mostly 1:1 with all the problems that has there is absolutely no way to review that PR or catch the AI going rogue with hallucinations etc. With a mostly 1:1 port you can at least check if things seem mostly the same.

but it also means this is just step 1 of very many, with the other being incrementally fixing soundness, removing unsafe and (hopefully) making the code more idiomatic...

(to got to the actual question of why?, I think the answer is doing this port using AI is likely way easier/faster then first writing a tool which need in depth understanding of the languages, especially given that some features in zig do not map 1:1 in rust and fuzzily mapping is what LLMs are good at and human hand written tools tend to be very bad at).

Their zig used smart pointers and aiui the port retained those constructs.
> The converted unsafe rust segfaulted at the same place the C code did. It's compatible, but not safe

That is indeed the point of c2rust. It gives you a baseline that is semantically identical to the original codebase, and with that passing the full test suite, bug-for-bug, you can then start gradually adopting rusty idioms to improve the memory safety of the codebase.

What comes out of c2rust is not intended for human consumption. It's more verbose than the original and harder to work on, but no safer. You lose the C idioms that people understand, while not gaining Rust idioms. It's like working on compiler-generated assembly code by hand.

2022 discussion on HN.[1]

There's a DARPA funded effort called TRACTOR, Translate All C To Rust, which has funded some efforts to develop a usable translator.[2] It's about 10 months after award, with no reported progress. I've been checking the personal sites of the academics involved, and they barely mention the project, although $5 million has been allocated to it.[3] The approach comes from U.C. Berkeley - let the LLM generate slop, check it using formal methods.[4] Not expecting near-term results.

[1] https://news.ycombinator.com/item?id=30169263

[2] https://csl.illinois.edu/news-and-media/translating-legacy-c...

[3] https://chandrasekaran-group.github.io/

[4] https://metalift.pages.dev/

Here is a public report from the TRACTOR evaluation team: https://github.com/DARPA-TRACTOR-Program/Reports/blob/main/F...

There are also some papers being published that were funded by TRACTOR, such as https://homes.cs.washington.edu/~mernst/pubs/c-rust-macros-p...

Evaluations of six translators. That's real progress.

Here are the test cases for evaluation #1.[1] There's good coverage of the C language, but the individual tests are mostly simple exercises of one C feature. The next round of test cases will probably be closer to useful programs.

[1] https://github.com/DARPA-TRACTOR-Program/PUBLIC-Test-Corpus

> let the LLM generate slop, check it using formal methods

I'm much more bullish on the opposite approach. Perform the naive translation, let the LLM loose on cleaning it up...

> What comes out of c2rust is not intended for human consumption.

That doesn’t really mean anything.

> It's more verbose than the original and harder to work on, but no safer. You lose the C idioms that people understand, while not gaining Rust idioms.

Yes, and?

The value of c2rust is that you now have the entire codebase working with the rust toolchain, you’re not juggling toolchains and you’re not managing a wavefront of FFI, only a wavefront if unsafe.

C2rust is not the end, it’s the start. It’s never claimed to be anything more (the official website even mentions galois and immunant are working on tooling to convert unsafe to safe / idiomatic rust though i don’t know if that got anywhere yet).

c2rust can generate UB in Rust even when there is no UB in the original C. It isn't bug-free, and C and Rust's undefined behavior semantics overlap but aren't identical.

One example: https://github.com/immunant/c2rust/issues/1678

I firmly believe the right way to port C and C++ (and Zig) programs to Rust is to do it module by module ("Ship of Theseus"). It needs scrutiny by folks who know both languages deeply, and you can port test cases too so you can detect UB at runtime (using tools like Miri). That's what fish did, and their port has been quite successful.

Blindly trusting the results of a machine translation is never a good idea. Especially when the translator has a temperature parameter.

The module with the code mentioned is at [1]

This is awful. They have some internal string format borrowed from a Zig library where the address of the item is in the low end of a pointer and the length is at the high end. Why are they doing that in 2026? It lets you save a few bytes at best. It doesn't enforce the Rust rule that strings must be strict UTF-8. It's totally alien to the safe way Rust handles strings.

[1] https://github.com/oven-sh/bun/blob/main/src/bun_core/string...

> It doesn't enforce the Rust rule that strings must be strict UTF-8

Judging by the name, nor should it, because OS-paths aren't always UTF-8. See for example the rust standard library type OsString https://doc.rust-lang.org/std/ffi/struct.OsString.html

The rust std library string is a reasonable default, but it's not always the right choice. Lots of projects use different things for good reasons.

For the same reason the V8 team bothered to set up a 32-bit addressing scheme for the GC heap even on 64-bit platforms, I imagine? The bytes add up when there’s enough of them.
The Bun founder/author has talked about this before as a temporary artifact: https://news.ycombinator.com/item?id=48141297
What they’ve done here isn’t safe either, and doesn’t have the consistent translation of rust2c.
Sure, but the point remains. They could've used Claude to build a Zig to Rust converter, ended up with something that was both deterministic _and_ beneficial to the wider community.
> The converted unsafe rust segfaulted at the same place the C code did

Isn't that exactly what you want from that kind of tool? Otherwise it means it's changing how stuff works and in this specific case that would be good but in most cases unwanted.

no string manipulation in C ??? this is trivial stuff guys . do you think the cpu has suddenly a different isa if you use Go? -_-. if you build unsafe code u will build unsafe code. certain logic will be solved for u and then u will fall flat on your face on the next logic u need to do yourself. why did you start programming?

its not cynical, its a real question. not trying to be mean but saying that something that people have solved countless time shouldnt be done is weird statement.

> emulate unsafe C pointer semantics with unsafe Rust

Yes and that would be safer than the current slop translation, because c2rust does these ugly things exactly to avoid introducing new issues with the more strict Rust semantics.

It _already was and should_ be that awful to match the original code without introducing hundreds of new UBs like the current situation.

> why not just build a translation tool for it?

They did ;) a highly dynamic one...

I mean, LLMs have been really good at translating code for a while now, which is why I'm more surprised that others are surprised this happened. They claim its a marketing trick despite the fact that they have to manage and maintain a fork of Zig if they don't switch languages.
They don't have to do that at all. They could've used mainline Zig, without their vibe coded changes to it.
They changed Zig because it was inadequate for their efforts, why would I keep using a tool that is inadequate if there's a better tool?
They claim to have made the Zig compiler faster, which is disputed. Even if true that wouldn't make it inadequate without their changes.
Because maybe they're wrong, and what they think is "inadequate for their efforts" is just due to their overengineering stupidity.
Really? What is this reddit? If you are going to resort to name calling at least provide some genuine facts? Show me what in their Zig fork was too much? You assume Zig is “finished” being built? Because thats a bold claim, seems every major Zig change is some very major shift in the language. Zig is where Rust was before it did the borrow checker system.
“Tbh they could've just hooked up zig translate-c to c2rust”

This doesn’t work like you think it does. These things are full of errors and make the code very verbose and hard to reason about. It works with small apps, not entire rewrites.

+1 - I was making exactly this argument in other threads. But I have a slightly different take on how software should be written.

Translating zig -> rust is more complex than writing a JPEG parser in static python and then lowering it into zig and rust differently using idiomatic construct for each language.

Towards that end, I've created a parser for a dialect of python which is suitable for this purpose. It should maintain compatibility with the vast majority of python code out there, while picking up some rust/zig features that make translation easier. JPEG parser included in the assets of the skill for a flavor.

https://github.com/py2many/static-python-skill

https://github.com/py2many/spy-ast

Yeah, this is the same annoyance I have with AI psychosis. Deterministic tasks should be done by deterministic tools. The amount of people I've seen translate morse code using AI is far larger than it should be.
That would have been the proper way to port a codebase to another language, by parsing the syntax tree and applying deterministic and verified transformations.
At this point bun is a free adspace for Anthropic and Claude. This is the only thing that would explain the rewrite
Because they aren't trying to raise billions of dollars to build a translation tool.