Hacker News new | ask | show | jobs
by thedigitalengel 4722 days ago
It isn't truly arch-independent till it assumes PARAMETER_CONSTANT and FUNCTION_CONSTANT will be stored as direct immediates in the generated code. On some archs, for instance, 0xFEEDBEEF might be too big a constant; and the compiler would then be forced to move it to a register (or a stack slot) in parts.

Edit: and of course, you run the possibility that on some archs, 0XFEEDBEEF is actually a valid encoding for some instruction. :)

1 comments

For examples of a CPU that behave differently, lok at RISC CPUs. 32-bit PowerPC, for example, would translate an immediate long load into an immediate 'load short into high word and zero out low word' and a signed immediate addition (it would load $DEAE first, then add -$4111 to get $DEADBEEF)

The list of problems is way longer, by the way. This code makes assumptions on pointer size (I don't think it will run on x64 with common ABI's)

There also is no guarantee that function pointers point to the memory where the function's code can be found (there could be a trampoline in-between, or a devious compiler writer could encrypt function addresses).

Neither is there a guarantee that functions defined sequentially get addresses that are laid out sequentially (there is no portable way to figure out the size of a function in bytes).

Finally, I don't think there is a guarantee that one can read a function's code (its memory could be marked 'execute only').

I guess those more familiar with The C standard will find more portability issues.

ARM (quite common these days) will sometimes do this as well, as the immediate value of an instruction has to fit inside of the fixed-width instruction, and the instruction is only itself 32-bits (some required for overhead). While it is slightly more likely you will see a PC-relative load, that requires an extra data fetch; so, instead, a "movt" (move top) instruction is used to set the upper 16-bits of a register after first setting the lower 16-bits. This requires just as much space as the load and separate data literal, and runs entirely out of the instruction cache.
What I saw an ARM compiler doing once is putting the 0xDEADBEEF into a data segment (in the 'text' area I think) then loading from there

Like it would load a string, but only 4 bytes, it can load in one step

Not certain if you are disagreeing, so to clarify: you can either do a load like that or two moves. The load requires one instruction word and one data word, so two words. The moves require two instruction words, so also two words. The load however, requires a data fetch. This fetch has to come from somewhere nearby, as the load instruction cannot offset very far, and so is going to be in the same segment as the code (called "text", for historic reasons: this has nothing to do with strings, which will be stored in the "data" segment, or sometimes even a "strings" segment), and most of the time will be directly after (or, for long functions, inside) of the code for that function. This is a trade off: moves are fast (cycle counts of instructions are not all the same, so "single step" doesn't mean much), have better guarantees that they will always have the same result (code changing has much heavier penalties than data changing), and might use up an asynchronous memory access that other parts of your code might be saturating (although I honestly am not certain if ARM does this like other architectures do). It thereby comes down to circumstance, configuration, and the whims of the people who wrote your compiler as to whether you will get mov+movt or a single ldr pc,.X+data.
I'm not disagreeing, just mentioning

"It thereby comes down to circumstance, configuration, and the whims of the people who wrote your compiler as to whether you will get mov+movt or a single ldr pc,.X+data."

Yes, this sums it nicely