Hacker News new | ask | show | jobs
by burfog 2817 days ago
The instruction pointer is the IP register. It is zero. It does not contain -16 or 0xfffffff0. The linear address is a different thing (computed from the CS base plus the IP/EIP/RIP content), as is the physical address.

Unless something has changed in recent hardware, there aren't 12 address lines just asserted. This is a side effect of the CS base being a particular value.

An important thing to realize is that x86 has hidden registers associated with segments. These registers get set when a segment selector register is loaded, not when it is used. The CS base is one of these hidden registers. If CS is loaded in protected mode, the base comes out of the descriptor table, and it remains when switching back to real mode. (this is the "unreal mode") If CS is loaded in real mode, the base comes from the selector shifted left, and this base remains even if you switch to protected mode. Switching modes doesn't change a segment base. Loading segment registers is what changes a segment base.

So initially, the CS base is not set in a way that matches what you would get if you loaded the CS selector value that is seen. It is set to a value that is possibly 0xfffffff0, 0x00000ffffffffff0, 0x0000fffffffffff0, or 0xfffffffffffffff0. The older documentation I've seen would use the largest of those values. I suppose it could then be cut down to 32-bit by the bottleneck that is normally a part of addressing when not in long mode. This is the sort of area where Intel, AMD, and others may differ.

Perhaps there is a hardware debugger for x86 (like a JTAG debugger) that would show the initial CS base. One could also guess that Simics or VMware might be correct, disassembling them to find out what they use. Another idea is to examine the badly-documented state used by the virtualization instructions.

1 comments

> The instruction pointer is the IP register. It is zero.

it is 0xfff0, at least according to Intel Software Developer's Manual Volume 3, section 9.1.4 "First Instruction Executed". regarding 12 address lines being asserted, that is just a way of thinking about it. actual implementation might be different but what happens on reset is akin to 12 most significant bits being set. CS is 0xf000.

indeed a debugger would give the right answer.

Initial IP was 0 on the 8086/8088. I suspect that detailed technical information like this tends to be copy-pasted more than understood, which is why a lot of second-sourced information out there on it is just plain wrong or caveated. The sometimes self-contradicting information in Intel's own docs doesn't help either.

This is what I've figured out from Intel's docs:

    8086/88:   CS:IP = FFFF:0000 first instruction at FFFF0
    80186/188: CS:IP = FFFF:0000 first instruction at FFFF0
    80286:     CS:IP = F000:FFF0 first instruction at FFFF0
    80386:     CS:IP = 0000:0000FFF0 or F000:0000FFF0[1], first instruction at FFFFFFF0
    80486+:    CS:IP = F000:0000FFF0(?) first instruction at FFFFFFF0
[1] Depending on which datasheet/programmer's reference manual you read. I can't find any reference to someone who actually checked what the hardware did, however.

More interesting reading...

http://www.rcollins.org/Productivity/DescriptorCache.html

http://www.rcollins.org/ddj/Aug98/Aug98.html

https://www.pcjs.org/pubs/pc/reference/intel/80386/loadall/

@userbinator thanks for clarification! this is indeed useful and helps understand contradictions.