|
|
|
|
|
by clamchowder
808 days ago
|
|
I don't know if the CDC 6600 can be considered superscalar. I called it scalar because it can never issue more than one instruction per cycle, and can thus never sustain faster-than-scalar execution. If you use a different definition of superscalar (just having concurrent operations in flight), then I guess that applies to the CDC 6600? Then it'd also apply to any pipelined core, including stuff like the Intel 486. |
|
For instance, if Ax is the base address, and Bx is the stride, you could do A4 = A4 + B4 ; loading X4 by side-effect A3 = A3 + B3 ; loading X3 by side-effect X1 = X5 + X6 ; do an add of data loaded in the previous unroll of the loop A2 = A2 + B2 ; store X2 by side-effect, presumably computed in previous unroll
All 4 of those fit in one 60 bit word. And if you are clever with loop unrolling and instruction timing, you can get a lot of overlap.
The 6600 had dirt-simple opcodes that took about a week to memorize, but.... that was a long time ago, so sorry I can no longer assemble machine code in my head. Memory fading.....