Hacker News new | ask | show | jobs
by tibiapejagala 4089 days ago
Seconding this. I've implemented a part of x86 instruction encoding and you either find resources:

* comprehensible, but far from complete (some blogs)

* complete, but hard to understand and requiring some implicit knowledge (Intel manual or [1])

Rather than disassembler I recommend writing some simple JIT compiler, with [2] as a starting point. You skip some problems this way.

[1] http://ref.x86asm.net/ this seems pretty cool as a reference, but I can't wrap my head around it

[2] http://eli.thegreenplace.net/2013/11/05/how-to-jit-an-introd...

2 comments

I use that first reference extensively.

But you have to understand that it's just a reference, it doesn't give you the complete picture. It just shows you the important stuff when you already know where to look.

I've written partial disassemblers/assemblers. And that site has been a huge help to me.

My 2 cents:

Start with being able to decode the mov instruction, with all the different possible memory encodings. Once you understand how you parse the memory/addressing scheme of x86 it's suddenly a whole lot easier. And I agree that writing an assembler to start is probably easier, to write a disassembler it has to be complete, but an assembler doesn't have to support all instructions to work.

I've written a pretty complete assembler a few years back. My advice, if you want to truly learn encoding, you need to write an assembler. The reason being, as you're trying to figure out if your assembler is generating the correct instructions you're going to be looking at it in hexdump format for days or weeks. Pretty soon you're going to notice prefixes, and will be able to visually decode instructions just by looking at them in hex bytes. It's really not that hard after a little practice, and knowing ModRM.

I will emphasize, the Intel manual is pretty much all I used. Along with NASM. I looked at NASM source a lot to figure out what they did, but also used NASM to compare generated instructions. The Intel manual is critical. I would go straight to the authoritative source. It's not hard to follow once you understand the terminology and format a bit. Just keep reading it.

edit: Oh, and, the most important thing to ever know: Intel is little-endian! I cannot stress understanding the importance of this enough. Even when you know this, it's very easy to forget it when looking at code.