| The end goal for DARPA's project is safe Rust. Translate unsafe constructs to safe constructs. C arrays of unknown length become Rust arrays or Vec items of known length. Pointer arithmetic becomes slices. (This covers most string manipulation, a high-risk activity in C.) Ownership translation probably will involve too much Rc/RefCell at first. The key to this is using an LLM to recognize idioms, then formal methods to verify that the LLM guessed right. Which is what the current project is trying to do. For example, consider C code with a function: void foo(int* tab, size_t len) { ... }
TRACTOR needs to do something like:- Look at foo and its callers. Determine whether "tab" is an array or an indirect pass of a single integer. - Look at foo and see if tab is ever modified. That determines whether it is mutable.
It may be necessary to go down the call chain. This analysis may not be able to determine if it is mutable, but if determines it is immutable, tab can be passed immutably. - Ask the LLM to figure out the size of tab from context in both foo and its callers. Is len the length of tab? That's a common idiom in C an LLM can recognize, but it might be a wrong guess. Is it the length in bytes or in ints? Remember, the programmer had to be able to figure this out, so the LLM has a good chance of doing it. - If tab is immutable, hypothesize that the correct translation is fn foo(tab: &[int]) { ... }
and within "foo", len comes from tab.len()Now attempt to use "lifting" formal analysis, as mentioned in the paper, to see if that changes the semantics of the program. Does "foo" subscript out of range? Can you prove that it doesn't? This problem needs both LLMs and formal methods. LLMs to infer intent, and formal methods to check that inference. |