| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by exikyut 2842 days ago

As someone interested in understanding/getting into assembly language (actually for years), I have a small torrent of naive and hopefully not too annoying questions. :)

- Why and how is argv[1] at [esp+8]? I realize this is a Linux-specific implementation detail but would like to understand.

- argv[1] is "gotten" by reading from esi. You then proceed to poke eax and ecx. TIL that SI, AX and CX were implicitly linked. What specifically is going on here?

- Continuing along the implicit linked-ness train, how is accessing AL reading the value set in SI?

- How does the LEA usage here work? You're putting the address of "[ecx+eax-'0']" into ecx. First question, how is ecx not clobbered? Secondly, how does that dereference work? I see similar semantics used in the 2nd and 3rd instructions, it seems AX and CX are linked in some way (in this situation).

- So the jecxz is bailing out (to an assumed label, that's okay) if ecx is 0. Cool. But why do you now zero and then increment eax?

- At the end you have a loop around a single xadd instruction, (which, if I understand what's happening, is the equivalent of "edx += eax" here?). The only way I can interpret this is that the atoiloop routine was actually building up a stack (in the LEA bit) and this is ticker-tape-ing its way through that?

I appreciate any insight and your patience :) if I'm learning something unfamiliar enough to be completely disorientating, reading through insights others have written down (ie, the questions others asked and documented the answers to) doesn't seem to help. I have to ask a thousand questions of my own in order to get my bearings.

This could be the fault of the assembly "tutorials" out there though. I can't remember the number of articles I've eye-glazedly read through; I understand binary, the stack, the basics, etc, incredibly incredibly well, but (as you can see) have tons of blocking issues and holes in my understanding. Some I'm aware of - for example, I know I have no mental model of how x86 memory management is done, and I've never been able to find anything concise that just deals with that - but unfortunately some of the holes are so big all I know is that something's missing and that my eyes glaze over and I don't know why. In any case, learning difficulties FTL. Been wanting to understand asm since 2006.

NB. While responding to another answer I stumbled on https://codegolf.stackexchange.com/a/135618/15675, a(n exhaustingly) exhaustive exploration of Fibonacci in x86, also for Linux.

2 comments

saagarjha 2842 days ago

> Why and how is argv[1] at [esp+8]? I realize this is a Linux-specific implementation detail but would like to understand.

The kernel puts them there when the program starts; I believe this is part of the System V calling convention so it's not Linux-specific.

> argv[1] is "gotten" by reading from esi. You then proceed to poke eax and ecx. TIL that SI, AX and CX were implicitly linked. What specifically is going on here?

They are not linked, the lodsb instruction loads a byte from the address pointed at by esi into ax.

> How does the LEA usage here work? You're putting the address of "[ecx+eax-'0']" into ecx. First question, how is ecx not clobbered? Secondly, how does that dereference work? I see similar semantics used in the 2nd and 3rd instructions, it seems AX and CX are linked in some way (in this situation).

lea just does math, it doesn't dereference anything. ecx is being clobbered, but that's what we want: the lea is doing the equivalent of ecx += eax - '0'.

> So the jecxz is bailing out (to an assumed label, that's okay) if ecx is 0. Cool. But why do you now zero and then increment eax?

This will be the return value of main and the exit code of the program. Since you errored, you want to return a nonzero value.

link

exikyut 2842 days ago

>> argv[1] is "gotten" by reading from esi. You then proceed to poke eax and ecx. TIL that SI, AX and CX were implicitly linked. What specifically is going on here?

> Where do you see this?

Here:

  mov esi, [esp+8]  ; get argv[1]                value written to esi; okay
  xor eax, eax      ; clear the upper bits       \ how is writing to eax
  xor ecx, ecx      ; will hold converted value  / and ecx now relevant?!

> lea just does math, it doesn't dereference anything.

Right; I said "LEA usage"; the instruction itself I understand, but "[ecx+eax-'0']" is throwing me for a loop.

link

T-hawk 2841 days ago

LEA does math, but with a particular structure to the inputs: a pointer plus an index plus a constant. That's meant to be used as a C compiler would, where the pointer would be to a struct array, the index would be the position within the array, and the constant would be the byte offset within the struct.

Calculating a memory address in that way is a common operation, so x86 processors give it special support with hardware that can calculate that sum all in one step.

Most often the computed address is used in a MOV instruction. LEA provides access to the address-computation hardware without doing the actual data move. That's useful because some math operations can be done faster through that hardware than with general-purpose math instructions. LEA requires that pattern of register + register*shift + constant (we aren't using the shift here).

    imul ecx, ecx, 10
    lea ecx, [ecx+eax-'0']

Those two instructions multiply ecx by 10 and then add the next digit (stored as ASCII in eax) to it. Subtracting '0' converts that digit from ASCII to its numerical value. It just so happens that LEA can execute the add and conversion fast and together in one step, because that combined operation matches the pattern of computing an address.

link

willvk 2841 days ago

I didn't know you could do that with lea. That's pretty cool!

link

saagarjha 2842 days ago

Sorry, I'm actively editing my answer as I get a better sense of your questions and I can answer more of them. Please see my edits; also, the specific lines you pointed out are not related to reading in argv[1]. They're just zeroing out eax and ecx; eax because we will only be writing to the lower bits with the lodsb and don't want out arithmetic to be fouled by anything left over in the higher bits, and ecx because we are continuously adding to it to in atoi, so it needs to start off at zero.

link

userbinator 2842 days ago

They are not linked, the lodsb instruction loads a byte from the address pointed at by esi into ax.

lodsb (b for Byte) loads into al, not ax. It's lodsw (Word) which uses ax, and of course there's lodsd for eax and even lodsq for rax (64-bit only).

This will be the return value of main and the exit code of the program. Since you errored, you want to return a nonzero value.

Actually that's the non-error path; eax is set to 1 and edx to 0 as initial values for computing fib(n) in the fibloop.

link

saagarjha 2842 days ago

Oops, you're right. I should have read that more carefully…

link

exikyut 2841 days ago

NB Addendum: https://i.imgur.com/u6jtIAZ.png

link