| I've written a (mostly) fully-featured x86 assembler. There are two gotchas that most people starting out probably won't realize until it's too late. 1) Encoding instructions on x86 can be more challenging than they first appear. Understanding how this works and fits together will save you a lot of headaches. 2) Forward jumps are a pain. Any reasonable assembler will provide symbolic labels for jumps and other operations. Because x86 instructions are variable length, you can't simply count up the lines to the label and multiply by some fixed amount. It doesn't work that way. Instructions can range from one byte to 15 bytes. What this means to you is that you'll need to design a multi-pass assembler. You allocate a fixed amount of space for the jump instruction and come back later to "patch" that jump instruction with the correct byte offset. Labels simply become keys in a dictionary that record the byte offset into the assembled binary data. Other things to make sure you understand: big/little endian, LSB/MSB, how two's-complement works, etc. One surprising aspect of writing an assembler is that you learn to decode instructions by simply looking at a hex dump of the binary output. You almost feel like Neo in the Matrix, in more nerdy less awesome sort of way. You start thinking of assembler and macros as merely convenience of having to write out hex (or binary) of the instructions you want. And on top of that, you see C as convenience for having to do assembler. |
Also worth noting there are "short" and "near" jumps, which trade off between a shorter encoding and a longer range of destination; if the target is a forward reference, not yet known at the time the jump is assembled, then nearly all assemblers will use the long variant, because they don't know yet whether the destination will be close enough. One notable exception is fasm, which starts off with being "optimistic" about jump sizes, and then increases only the ones which didn't quite make it, repeatedly, until all the offsets are large enough. Here's a series of very detailed posts about that from its author:
https://board.flatassembler.net/topic.php?t=20249