| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Retr0id 66 days ago
	Tokens can also be burnt on decompilation.

3 comments

tptacek 66 days ago

Yes, and it apparently burns lots of tokens. But what I've heard is that the outcomes are drastically less expensive than hand-reversing was, when you account for labor costs.

link

jeffmcjunkin 66 days ago

Can confirm. Matching decompilation in particular (where you match the compiler along with your guess at source, compile, then compare assembly, repeating if it doesn't match) is very token-intensive, but it's now very viable: https://news.ycombinator.com/item?id=46080498

Of course LLMs see a lot more source-assembly pairs than even skilled reverse engineers, so this makes sense. Any area where you can get unlimited training data is one we expect to see top-tier performance from LLMs.

(also, hi Thomas!)

link

stackghost 66 days ago

My own experience has been that "ghidra -> ask LLM to reason about ghidra decompilation" is very effective on all but the most highly obfuscated binaries.

Burning tokens by asking the LLM to compile, disassemble, compare assembly, recompile, repeat seems very wasteful and inefficient to me.

link

mikestaas 66 days ago

LaurieWired did a good episode about that kind of thing https://www.youtube.com/watch?v=u2vQapLAW88

link

kimixa 66 days ago

That matches my experience too - LLMs are very capable in "translating" between domains - one of the best experience I've had with LLMs is turning "decompiled" source into "human readable" source. I don't think that "Binary Only" closed-source isn't the defense against this that some people here seem to think it is.

link

echelon 66 days ago

Has anyone used an LLM to deobfuscate compiled Javascript?

link

lelanthran 66 days ago

> Has anyone used an LLM to deobfuscate compiled Javascript?

Seems like a waste of money; wouldn't it be better to extract the AST deterministically, write it out and only then ask an LLM to change those auto-generated symbol names with meaningful names?

link

heeen2 66 days ago

yes, but it requires some nudging if you don't want to waste tokens. it will happily grep and sed through massive javascript bundles but if you tell it to first create tooling like babel scripts to format, it will be much quicker.

link

echelon 65 days ago

> but if you tell it to first create tooling like babel scripts to format, it will be much quicker.

Can you expand on this? Is that existing tooling for deminification?

link

heeen2 58 days ago

for me it was custom scripts looking for data in minified bundles and refactoring for easier protocol reverse engineering, e.g. https://github.com/echtzeit-solutions/monsgeek-akko-linux/bl...

link

saagarjha 66 days ago

I've used it for hobby efforts on Electron/React Native (Hermes bytecode) apps and it seems to work reasonably well

link

bitexploder 66 days ago

Yep. They are good at it.

link

gfosco 66 days ago

Yeah, it's token intensive but worth it. I built a very dumb example harness which used IDA via MCP and analyzed/renamed/commented all ~67k functions in a binary, using Claude Haiku for about $150. A local model could've accomplished it for much less/free. The knowledge base it outputs and the marked up IDA db are super valuable.

link

whattheheckheck 66 days ago

Do you have the repo example?

link

heeen2 66 days ago

I did something similar using ghidramcp for digging around this keyboard firmware, repo contains the ghidra project, linux driver and even patches to the original stock fw. https://github.com/echtzeit-solutions/monsgeek-akko-linux

link

somesortofthing 66 days ago

Another asymmetric advantage for defenders - attackers need to burn tokens to form incomplete, outdated, and partially wrong pictures of the codebase while the defender gets the whole latest version plus git history plus documentation plus organizational memory plus original authors' cooperation for free.

link

high_na_euv 66 days ago

>original authors' cooperation

>for free.

Haha, it is more complicated in reality

link

echelon 66 days ago

> Tokens can also be burnt on decompilation.

Prediction 1. We're going to have cheap "write Photoshop and AutoCad in Rust as a new program / FOSS" soon. No desktop software will be safe. Everything will be cloned.

Prediction 2. We'll have a million Linux and Chrome and other FOSS variants with completely new codebases.

Prediction 3. People will trivially clone games, change their assets. Modding will have a renaissance like never before.

Prediction 4. To push back, everything will move to thin clients.

link

jgraham 66 days ago

I think if prediction 1 is true (that it becomes cheap to clone existing software in a way that doesn't violate copyright law), the response will not be purely technical (moving to thin clients, or otherwise trying to technically restrict the access surface to make reverse engineering harder). Instead I'd predict that companies look to the law to replace the protections that they previously got from copyright.

Obvious possibilities include:

* More use of software patents, since these apply to underlying ideas, rather than specific implementations.

* Stronger DMCA-like laws which prohibit breaking technical provisions designed to prevent reverse engineering.

Similarly, if the people predicting that humans are going to be required to take ultimate responsibility for the behaviour of software are correct, then it clearly won't be possible for that to be any random human. Instead you'll need legally recognised credentials to be allowed to ship software, similar to the way that doctors or engineers work today.

Of course these specific predictions might be wrong. I think it's fair to say that nobody really knows what might have changed in a year, or where the technical capabilities will end up. But I see a lot of discussions and opinions that assume zero feedback from the broader social context in which the tech exists, which seems like they're likely missing a big part of the picture.

link