|
|
|
|
|
by accrual
30 days ago
|
|
Pretty interesting. I wonder if a whitelist against certain columns in the output could help, e.g. this column can only contain valid x86 instructions (e.g. MOV is allowed, M0V is not), this column can only contain hexadecimal (1 is allowed but never "l"), etc. Probably more work than it's worth given the final line-by-line comparison that happens anyway. |
|
I might do what you said, column sensitive. A first-pass assembler which does spell checking and makes the corrections. M0V is a single replacement on MOV, MOV8 is closest to MOVB. For registered, R Oh must be R zero. But R Oh will be valid as a symbol name (curse your poor choice of symbol name). Alas, R1 is defined in the symbol table as a mnemonic for 1.
This idiom occurs in TMS9900 assembly (of which I have 2100+ pages to scan)
Indexed addressing into caller's register file: MOV @R1*2(R13),R0
Where R1 is 1, a small offset in #words so the operand is pointer to the word after where R13 points. Yet @RI(R13) is valid if RI is in the symbol table.
So there has to be some heuristic that starts at "is RI a defined symbol?" "Can a symbol be used in this context?" Yes/Nope: it is probably R1.
And R11 is used a lot.
Same curse on people who used I as a counter variable in type-in programs. Countless folks typed it as a 1 in expressions before magazines got better fonts.