Hacker News new | ask | show | jobs
by petee 68 days ago
What would be the difficulty level for it to just read the machine code; are these models heavily relying on human language for clues?
5 comments

Reasoning on pure machine code or disassembly is still hit and miss. For better results you can run the binary through a disassembler, then ask an llm to turn that into an equivalent c program, then ask it to work on that. But some of the subtleties might get lost in translation
If you put codex in Xhigh and allow it access to tools, it will take an hour but it will eventually give you back quality recompiled code, with the same issues the original had (here quality means readable)
I had a bit of a pain of a time trying to get Claude to work with ghidra. What you’re describing seems like a better alternative, would you agree?
You can tweak the current Ghidra MCP to work in headless mode. It makes things much easier.
I've had a lot of luck with pyghidra-mcp -- give it a try :)
Well i have tried and it only works for simple use-case.
I had Claude reverse some firmware. I gave it headless ghidra and it spat out documentation for the internal serial protocol I was interested in. With the right tools, it seems to do pretty well with this kind of task.
Paired with Ghidra having a binary, being able to do a memory dump of a live running program, and being able to use wireshark to dump traffic over network/bluetooth/usb is VERY helpful if you don't have the source code.

You use decompilation tools and hope they left debug symbols in and it turns it into somewhat human-readable language which is often enough. Even when you don't binaries use libraries which are known or at some point hit documented interfaces so things can be reasoned about.

It will have to use a disassembler, or write one. I recently casually asked gpt-5.4 to translate the content of a MIDI file to a custom sound programming language. It just wrote a one-shot MIDI parser in Python, grabbed the data, and basically did a perfect translation at first try. Nice.
I've seen Claude do similar things for image files. Don't have PNG parsing utilities installed? No worries, it'll just synthesize a Python script to decode the image directly.
I have had Claude read usbpcap to reverse engineer an industrial digital camera link. It was like pulling teeth but I got it done (I would not have been able to do it alone)