Hacker News new | ask | show | jobs
by ralph 5098 days ago
STA is `store accumulator' not address. Would you have liked to type STORE_ACCUMULATOR with the high frequency that the instruction occurs on an editor that had no auto-complete? Even reading it is slower than STA. And having them be all three letters meant assemblers could pack the text into memory in fixed-length records; every byte mattered.

Here's the table of ARM mnemonics in the source to Acorn's BBC BASIC for the ARM. https://www.riscosopen.org/viewer/view/castle/RiscOS/Sources... For the 6502, space was tight enough that it was packed to less than three-bytes per mnemonic.

ARM was born from Acorn's frustration with 16-bit CPUs that they considered as successors to the 6502, e.g. 68000, not the 6502.

2 comments

Your right https://sites.google.com/site/6502asembly/6502-instruction-s...

"ARM was born from Acorn's frustration with 16-bit CPUs that they considered as successors to the 6502, e.g. 68000, not the 6502" yes and no, were both right. ARM looked at a 16bit replacement for the 6502 and found the options of the 68000 not having the performance they wanted. They went to America and checked out the work on the replacement for the 6502 and concluded that they could just make there own CPU and so they did.

Had the replacement for the 6502 not been a one man team then history would be different now.

>  And having them be all three letters meant assemblers could pack the text into memory in fixed-length records; every byte mattered.

As much as I agree with using the mnemonics, this is a bogus argument. Even C64 BASIC tokenized stuff before storing it, because there's no reason to store the name at all. In fact, if you prefer, the 6502 instruction set is small enough to represent it in the assemblers editor as a single byte index into an array. Or you could just use the opcode itself.

BBC BASIC tokenised BASIC keywords before the line was stored in memory, e.g. PRINT was represented by a single byte. Some tokens needed more than one byte, especially in later versions.

But I'm talking about assembler here. BBC BASIC has a built-in assembler, e.g. 6502, Z80, or ARM, depending on the CPU it's running on. The assembler source in the BASIC program is not tokenised on input but stored as plain text. Instead, when those lines of BASIC, since that's what these embedded lines of assembler, wrapped in [...], are, get run the machine code is assembled at the address in BASIC's P% integer register variable and P% is moved on. At that point of execution BASIC must hunt for the mnemonic, stored in the "tokenised" BASIC line as plain text, in its table; the table I reference in the case of ARM BASIC. That table can be laid out as it is because each mnemonic is three characters long, e.g. mov, ldr, stm, and bic.

You mixing tokenising BASIC, which BBC BASIC did, and the embedded ARM assembler, which it didn't, and then adding in an "assembler's editor", and there wasn't one of them. Just lines of BASIC program, 10, 20, ..., some of which switched to assembler with a [.

I'm not talking about the BBC specifically all - the specific system is irrelevant - and so I'm not "mixing" anything. Many 6502 based systems did have assembler editors; many more had "monitors" that would assemble line by line on the fly - if not built in then as common extensions.

(in fact I did most of my M6502 assembly programming in a monitor, with a notepad to keep track of where various functions started; it was first a couple of years after a I started doing assembly that I got a proper macro assembler for my C64, and even then exactly because "every byte mattered" it was not at all uncommon to still stick to a monitor on a cartridge rather than have a macro assembler "waste" precious memory for the assembler and source text)

What I'm talking about is the general idea that longer keywords somehow would prevent an assembler from using fixed length records to represent lines, though reading it in context of what you wrote above I see your reference to fixed length records referred to the table used for assembling, not to the source lines in which case it makes slightly more sense to me.

Though not fully, as it'd be both faster and take less code to use custom search code to match the input against the available opcodes than to insist on a fixed length record - did a quick check and it should be doable to save at least a dozen or two bytes and reduce the average search time significantly by range checking and using lookup table for the first character. It might've been convenient to write the code with fixed records, but it's far from optimal in terms of either performance or code size, so it doesn't seem like code size bothered them that much in this case.

The "every byte mattered" applies to source too on these systems, and I actually find it really curious that they went to the step of supporting inline assembly but then didn't apply that optimization to the source given the limited memory and performance of these systems. Especially since the opcode itself makes a very obvious token candidate, potentially leaving the "assembly" step itself reduced to mostly copying data and applying address fixups.