| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by rep_lodsb 189 days ago

Only the Z80 refetched the entire instruction, x86 never did it this way. Each bus transfer (read or write) takes multiple clocks:

    CPU                        Cycles  per              theoretical minimum per byte for block move
    Z80 instruction fetch      4       byte
    Z80 data read/write        3       byte             6
    80(1)88, V20               4       byte             8
    80(1)86, V30               4       byte/word        4
    80286, 80386 SX            2       byte/word        1
    80386 DX                   2       byte/word/dword  0.5

LDIR (etc.) are 2 bytes long, so that's 8 extra clocks per iteration. Updating the address and count registers also had some overhead.

The microcode loop used by the 8086/8088 also had overhead, this was improved in the following generations. Then it became somewhat neglected since compilers / runtime libraries preferred to use sequences of vector instructions instead.

And with modern processors there are a lot of complications due to cache lines and paging, so there's always some unavoidable overhead at the start to align everything properly, even if then the transfer rate is close to optimal.

1 comments

adrian_b 189 days ago

This is correct, but it should be noted that the 2-cycle transfers of 286/386SX/386DX could normally be achieved only from cache memory (if the MB had cache), while for DRAM accesses at least 1 or 2 wait states were needed, lengthening the access cycles to 3 or 4 clock cycles.

Moreover, the cache memories used with 286/386SX/386DX were normally write-through, which means that they shortened only the read cycles, not also the write cycles. Such caches were very effective to diminish the impact on performance of instruction fetching, but they brought little or no improvement to block transfers. The caches were also very small, so any sizable block transfer would flush the entire cache, then all transfers would be done at DRAM speed.

link

rasz 188 days ago

0 wait state 286 was pretty standard affair for 8-10 and some 12MHz gray boxes. Example https://theretroweb.com/motherboard/manual/g2-12mhz-zero-wai...

"12MHz/0 wait state with 100ns DRAM."

another https://theretroweb.com/chip/documentation/neat-6210302843ed...

"The processor can operate at 16MHz with 0.5-0.7 wait state memory accesses, using 100 nsec DRAMs. This is possible through the Page Interleaved memory scheme."

link