| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by vidarh 1187 days ago
	ChatGPT is pretty good at disassembling x86, and is able to give reasonable descriptions of what the code is doing (e.g try "disassemble the following bytes in hex and explain what they appear to be doing: [bytes from a binary in hex]") I'm curious how soon someone uses these models to effectively ruin the ability to use releasing binaries as an obfuscation method.

2 comments

datpiff 1187 days ago

Using ChatGPT as a disassembler seems like a dumb idea when free disassemblers already exist. What possible advantage does it give?

link

vidarh 1187 days ago

The ability to explain the code, and extract higher level understanding. Disassembling into raw instructions is the most trivial part of reverse engineering an application. Hence "and explain what they appear to be doing" bit.

For the pieces I've tested, it often recognises the source language, and could give ideas about what the code was for and what it did.

link

datpiff 1187 days ago

> Disassembling into raw instructions is the most trivial part

So why not do it in the proven-correct tools and give ChatGPT the instructions?

I'm all for finding neat use cases but I wouldn't use an AI chatbot as a calculator...

link

vidarh 1187 days ago

You could do that too, but that is entirely missing the point, which is that ChatGPT is capable of inferring higher level semantics from the instructions and explain what the code is doing. You're getting hung up on a minor, unimportant detail.

link

datpiff 1187 days ago

Apparently the point is proving it's possible. Not making it useful.

link

vidarh 1186 days ago

No, that was not the point at all.

The point is that ChatGPT understands the code well enough to explain what it does, and so there's reason to wonder how soon someone leverages that in a disassembler to produce far better output to the point where using releasing "only" the binary as an obfuscation mechanism stops being viable.

E.g. additional value would be comments about purpose, labels for function entry points that makes sense for what the function does, labels for data that makes sense for what is stored there, comments with explanations of why things are structured the way they are.

Having reverse engineered programs from binary in the past, inferring good names and commenting the sources is the vast majority of the effort.

link

circuit10 1186 days ago

It will struggle with understanding small parts of big programs without seeing the full context, though maybe you could get around that by making it generate some sort of summary for itself or something like that

link

vidarh 1186 days ago

My tests on small parts of big programs suggests if anything that it does far better than I expected, but you're probably right that it would struggle with that for many things if you tried turning it into a bigger tool, and having it generate summaries is probably essential. While we can "fake" some level of memory that way, I really would like to see how far LLM's can go if we give them a more flexible form of memory...

link