Hacker News new | ask | show | jobs
by RolfRolles 499 days ago
No, it's really not a strict improvement. A meaningless name like `v2` does at least convey that you, as the analyst, haven't understood the role of the variable well enough to rename it to something more fitting to its inferred purpose. If the LLM comes up with an "informative" variable name that is not very well-suited towards what it actually does, the name can waste your time by misleading you as to the role of the variable.
1 comments

I think Ghidra could do better even without any LLM involved. Ghidra will define local variables like this:

SERVICE_TABLE_ENTRY* local_5c;

I wish it at least did something like:

SERVICE_TABLE_ENTRY* local_5c_pServiceTableEntry;

Oh yeah, there’s probably some plugin or Python script to do this. But I just dabble with Ghidra in my spare time

It would be great if it tracked the origin of a variable/parameter name, and could show them in a different colour (or some other visual distinction) based on their origin. That way you could easily distinguish “name manually assigned by analyst” (probably correct) vs “name picked by some LLM” (much more tentative, could easily be a hallucination)

In my view one of the most pressing shortcomings of Ghidra is that it can't understand the lifetimes of multiple variables with overlapping stack addresses: https://github.com/NationalSecurityAgency/ghidra/issues/975

Ghidra does have an extensive scripting API, and I've used LLMs to help me write scripts to do bulk changes like you've described. But you would have to think about how you would ensure the name suffix is synchronized as you retype variables during your analysis.

Yeah, I don't know why they don't use something like SSA – make every line of code which performs an assignment create a new local variable.

Although I suppose when decompiling to C, you need to translate it out of SSA form when you encounter loops or backwards control flow.