Hacker News new | ask | show | jobs
by Zenst 5098 days ago
For a assembly language designed for humans they sure did like there 3 letter TLA's.

How hard would it of been to have STORE_ADDRESS instead of STA.

To realy learn a assembly language you realy need to write code and to write code you realy have to have a project/purpose.

Now 6502 is one of the oldest assembly languages still in active use as they still do very well in the microcontroler sector. Though that said ARM is also in that area and alot cheaper to obtain ARM compatable kit. ARM was born out of frustrations/limitation with the original 6502 CPU and in that may be a better more practical use of your educational time.

That all said - every programmer of any language should at least learn/play with one assembly language sometime in there life, maybe one or two. I remember after my ZX81 I opted for the Oric-1 over the Spectrum just becasue it had a different CPU (6502) and after that I opted for the AtariST (6800) and a amstrad PC (X86).

Also inventing your own CPU/assembler is not as hard and intimidating as alot will think. All are very rewarding and a good use of your time on a rainy day.

4 comments

Out of interest, how old are you?

I remember writing programs that wouldn't have fitted in memory if we'd used things like "STORE_ADDRESS" instead of "STA". The assembler would have had to have been more complex in order to process instructions that were of variable length, instead of the opcodes being a predictable 3 letters.

I've written assembler by hand - sheets and sheets of it - because there wasn't a decent editor on the machine I was writing for. These were the days when you were writing code for the machine, and not for the people who would maintain it afterwards. The structure of the code had to be clear, and the comments were as much for yourself as anyone else, but the opcode names were a complete non-consideration. If you didn't know them, you couldn't program anyway.

45 and the whole point i was making is:

1) learning something new you might as well have something easier to learn 2) just becasue historicaly you had to use abreviations is not a handicap you have to impose upon yourself thesedays - especialy if your learning it from scratch and for educational aspects. 3) Sure you can use short TLA's instead of a longer version but for ease of reading and learning then something alot cleareer for the human without that over-comprimise you had with memory of computers is a artificial limitation. 4) realy not hard to run a substitution script to conver long to short and vice a versa - sed anyone!

We have all done assembly by hand and hand converted it, the compiler was a luxury for some back then and there small memory machines and even then you were not limited by the official shortcode versions of TLA's.

Thing is with hand converting is that you write something not maintainable on many levels, but as you said, you bent towards those lmitations as you had not alot of choice.

So if you want on say a Z80 write RETURN or the offical mnenoic of RET or go real hard code and just write C9 (using this example as my personal memory space seemed to of kept that one alive) then it was your choice. When you went to code it was C9h so converting RETURN or RET was something you did.

Least only op code that was standard across CPU's was NOP or "NO OPERATION" aka do nothing or 00h or 0 or 00000000, that was kinda portable and used by many for funky double-entry code padding etc. Though that was due to memory limitations and scary stuff to maintain, yet fun and rewarding to code. Apple early OS used that approach alot due to memory limitatons.

Heck of memory was such a limitation back then - explain COBOL becasue I can't, sadly still remember that as well :|.

Well, there's no point in getting out of shape about it, and I've got lots of other more important things to do than trying to convince you of this, but it seems to me that learning 6502 is already pretty useless as compared with learning something like ARM7 or StrongARM. Even there the assemblers still use TLAs for the operations. I honestly feel that it just seems right to maintain the contextual relevance and, in some sense, the culture of assembler.

If you want readability then by all means use Python or Go or Ruby or something like that. I don't know anyone who writes in assembly who doesn't use the TLAs (or similarly concise designations) for the operations, no matter what processor they're using. In feels to me like there is something natural about it.

But even beside that, I personally find that abbreviations make it easier to think in whatever subject I'm working on. When I write in assembler I think "MOV" - I don't think "move". Jargon in any field is there to make communication faster and more effective, and linguistics says that common expressions gets shorter over time.

So I think you're trying to improve the wrong thing, and while to some it may seem obvious that spelling out operations more verbosely and making them more obvious will help people learn, I'm not convinced. Sometimes concise, precise and semi-opaque terms can actually help learners.

I thought the original post was wrong to choose 6502. "So, it was designed to be written by humans. More modern assembly languages are meant to written by compilers, so let’s leave it to them. Plus, 6502 is fun. Nobody ever called x86 fun." I assume he's never written ARM. It was designed to be written by hand, is delightful to write, much more orthogonal than 6502, and still relevant today.
STA didn't mean store address anyway, what would your long neumonics for store X or store Y be?

Having learned 6510 asm when I was younger, mov always seemed backwards and magical to me.

I'm not convinced. There's so little to your average assembly language that making the mnemonics longer won't help. With 6502, it would be totally pointless. You're going to be spending, what, 1 week learning this stuff, and then the next N years using it. You'll get used to it quickly enough. It makes more sense to optimise for experts, than it does for people who don't yet know what they're doing.

(And anyway, where do you stop? If you can't remember that STA means store accumulator and LDA means load accumulator, how will you remember what (&70),Y means, or what flags they use, or how many cycles they take? You'll end up with something like SUBTRACT FROM ACCUMULATOR MEMORY IN ADDRESS STORED IN &70 WITH Y REGISTER AND INVERTED CARRY FLAG WITH RESULT AFFECTING N AND Z AND C CLEARED IF BORROW AND V SET IF OVERFLOW TAKING 6 CYCLES PLUS PAGE BOUNDARY CROSSING PENALTY ;) - and even that probably isn't clear enough, because how will the poor reader know what the page boundary crossing penalty is if they don't know already?)

If you have something like x86's PUNPCKHBW, or POWER's rlinmw, and try to describe what they do clearly, you'll end up in even more of a mess. A one-volume instruction reference manual, sorted by opcode, with diagrams and pseudocode, would be far more useful.

As if to do almost the exact opposite of backing up my point, the PPC opcode I was thinking of is in fact `rlwimi' - Rotate Left Word Immediate then Mask Insert. I was thinking more like, Rotate Left and Insert Mask Word. Oh well.

So maybe longer opcodes would help, but I'd have got it wrong in either event - and I'd still need to have double checked the docs, to remind myself, again, just what the hell it does exactly.

IIRC, on 6502, 0x00 is BRK, not NOP. http://www.masswerk.at/6502/6502_instruction_set.html agrees with that.
I forgot for every rule there is always an exception - cheers.
STA is `store accumulator' not address. Would you have liked to type STORE_ACCUMULATOR with the high frequency that the instruction occurs on an editor that had no auto-complete? Even reading it is slower than STA. And having them be all three letters meant assemblers could pack the text into memory in fixed-length records; every byte mattered.

Here's the table of ARM mnemonics in the source to Acorn's BBC BASIC for the ARM. https://www.riscosopen.org/viewer/view/castle/RiscOS/Sources... For the 6502, space was tight enough that it was packed to less than three-bytes per mnemonic.

ARM was born from Acorn's frustration with 16-bit CPUs that they considered as successors to the 6502, e.g. 68000, not the 6502.

Your right https://sites.google.com/site/6502asembly/6502-instruction-s...

"ARM was born from Acorn's frustration with 16-bit CPUs that they considered as successors to the 6502, e.g. 68000, not the 6502" yes and no, were both right. ARM looked at a 16bit replacement for the 6502 and found the options of the 68000 not having the performance they wanted. They went to America and checked out the work on the replacement for the 6502 and concluded that they could just make there own CPU and so they did.

Had the replacement for the 6502 not been a one man team then history would be different now.

>  And having them be all three letters meant assemblers could pack the text into memory in fixed-length records; every byte mattered.

As much as I agree with using the mnemonics, this is a bogus argument. Even C64 BASIC tokenized stuff before storing it, because there's no reason to store the name at all. In fact, if you prefer, the 6502 instruction set is small enough to represent it in the assemblers editor as a single byte index into an array. Or you could just use the opcode itself.

BBC BASIC tokenised BASIC keywords before the line was stored in memory, e.g. PRINT was represented by a single byte. Some tokens needed more than one byte, especially in later versions.

But I'm talking about assembler here. BBC BASIC has a built-in assembler, e.g. 6502, Z80, or ARM, depending on the CPU it's running on. The assembler source in the BASIC program is not tokenised on input but stored as plain text. Instead, when those lines of BASIC, since that's what these embedded lines of assembler, wrapped in [...], are, get run the machine code is assembled at the address in BASIC's P% integer register variable and P% is moved on. At that point of execution BASIC must hunt for the mnemonic, stored in the "tokenised" BASIC line as plain text, in its table; the table I reference in the case of ARM BASIC. That table can be laid out as it is because each mnemonic is three characters long, e.g. mov, ldr, stm, and bic.

You mixing tokenising BASIC, which BBC BASIC did, and the embedded ARM assembler, which it didn't, and then adding in an "assembler's editor", and there wasn't one of them. Just lines of BASIC program, 10, 20, ..., some of which switched to assembler with a [.

I'm not talking about the BBC specifically all - the specific system is irrelevant - and so I'm not "mixing" anything. Many 6502 based systems did have assembler editors; many more had "monitors" that would assemble line by line on the fly - if not built in then as common extensions.

(in fact I did most of my M6502 assembly programming in a monitor, with a notepad to keep track of where various functions started; it was first a couple of years after a I started doing assembly that I got a proper macro assembler for my C64, and even then exactly because "every byte mattered" it was not at all uncommon to still stick to a monitor on a cartridge rather than have a macro assembler "waste" precious memory for the assembler and source text)

What I'm talking about is the general idea that longer keywords somehow would prevent an assembler from using fixed length records to represent lines, though reading it in context of what you wrote above I see your reference to fixed length records referred to the table used for assembling, not to the source lines in which case it makes slightly more sense to me.

Though not fully, as it'd be both faster and take less code to use custom search code to match the input against the available opcodes than to insist on a fixed length record - did a quick check and it should be doable to save at least a dozen or two bytes and reduce the average search time significantly by range checking and using lookup table for the first character. It might've been convenient to write the code with fixed records, but it's far from optimal in terms of either performance or code size, so it doesn't seem like code size bothered them that much in this case.

The "every byte mattered" applies to source too on these systems, and I actually find it really curious that they went to the step of supporting inline assembly but then didn't apply that optimization to the source given the limited memory and performance of these systems. Especially since the opcode itself makes a very obvious token candidate, potentially leaving the "assembly" step itself reduced to mostly copying data and applying address fixups.

STORE_ACCUMULATOR may be easier to understand than STA, if you're seeing it for the first time, but it's not easier to use once you know what it means.

Incidentally, I wrote a simple 6502 assembler back in the days, and I took the opportunity to invent my own notation, just for fun. I became quite adept at reading and writing it, and standard notation felt very verbose after a while.

Here is, in standard notation, a program that copies 256 bytes from ORIGIN to DEST.

        LDY #$00
  LOOP  LDA ORIGIN,Y
        STA DEST,Y
        INY
        BNE LOOP
        RTS
Here is the same program in my notation (yes, all on one line. I used a space character to delimit instructions).

  Y<0 LOOP: A<(ORIGIN,Y) A>(DEST,Y) Y+ #LOOP ]
STORE_ADDRESS takes more than 4 times the memory of STA. RAM was not cheap! Also having fixed width and shorter opcodes made assemblers faster and easier to write.
Erm you don't run assembly languages you still compile them to machine code!

Fixed width is easier to process and not necasaryly to write. Remember it is about learning a assembler here - not pandaring towards limited computer memory and processing approaches of the time. THAT is a seprate issue and on that note thanks for the mod down point ;|.

As pjmlp points out, not everyone compiled to machine code by hand - I talked about assemblers being faster, not assembly. Those who did work by hand would also appreciate fixed width and short opcodes, and squared paper...

If this is about learning 6502, then rewriting the official assembly into something new would be antiproductive. But don't blame me for touching your mod points.

IF its about learning 6502 then its not hard to run a SED script and convert your version into the official version. Work with what you find easiest and use the computer to do the hard work.

I've done hand coding and those who have will agree, its a education in futility in painful uneeded processing for sadists. Coding sheets are fun but when you can type faster than you can write then they are very annoying.

Now back in early home micro days you had no real choice but to hand code your assembly into machine code and in that fixed coding sheets realy made no difference at all and if anything I found got in the way apart from screen design.

Point is in thsis day and age - impossing and having to be forced into learning TLA's when you can have something meaningful is something realy not needed, but thats another story.

Is this about learning 6502 or learning assembler as they are both seperate area's. 6502 has a nice history of reading and was done back in the time were one chap could invent a CPU, one man could write a application etc etc. Nowadays its not as easy due to size/complexity etcetc.

If you want to teach somebody something then imposing artificial limitations of the days - is that realy needed as a extra level of distraction, we can agree to disagree upon that.

You still needed to write the program before assembling it, so your suggestion would be taking too much memory in the assembler editor.
memory is not a issue thesedays for assembly language. to impose such limits today for the sake of history then you may as well cut up bits of cereal box's, get a hole punch and make your own punched cards!
6502 is also not for "these days". That said, its a tutorial about _learning_ it, not redesigning it to be modern and easier.