Hacker News new | ask | show | jobs
by brandmeyer 2575 days ago
There's a standard technique for this that is as old as variable-length ISA's. Yes, you've identified the fundamental problem with the optimistic approach.

You do start with a pessimistic approach and assume jump distances that are the maximum length instructions. On each pass, you restrict yourself to reducing the lengths of instructions. Each pass shrinks only those instructions which currently reach their targets and recomputes only those jump distances. Yes, you can construct pathological cases where you only get to compact one insn at a time.

It doesn't matter. The goal of the asssembler is not to produce the optimal encoding, it is to produce a valid encoding. The program was valid at every step of the way. So after running N passes, you just stop and emit the program you got.

Oddly enough, its the fixed-length RISC ISA's that have a bigger problem with this. A RISC-V branch usually only has 12 bits of jump distance available. That's good enough most of the time, but 5% of the time (varying widely with the program) the target isn't in range. To handle those cases, you have to emit a longer-distance unconditional jump, and invert the logic of the parent branch to hop over the long jump instead of falling through to the 'else' label. Essentially, its a 64-bit encoding of a branch instruction with 20 bits of jump distance. Pretty sure the RISC-V assembler assumes the case of 12-bit distances in 32-bit instructions at first.

Eventually, there is a pass limit for optimistic assemblers as well. But when the limit is reached, instead of emitting a program that is less than optimal, you just return an error, because you didn't have a valid program that you could emit at all.