|
|
|
|
|
by Sesse__
535 days ago
|
|
Yes, internally fxch is a register rename—_and_ fxch can go in the V-pipe and takes only one cycle (Pentium has two pipes, U and V). IIRC fadd and fmul were both 3/1 (three cycles latency, one cycle throughput), so you'd start an operation, use the free fxch to get something else to the top, and then do two other operations while you were waiting for the operation to finish. That way, you could get long strings of FPU operations at effectively 1 op/cycle if you planned things well. IIRC, MSVC did a pretty good job of it, too. GCC didn't, really (and thus Pentium GCC was born). |
|
Compilers got pretty good at optimizing straight line math but were not as good at cases where variables needed to be kept in the stack during a loop, like a running sum. You had to get the order of exchanges just right to preserve stack order across loop iterations. The compilers at the time often had to spill to memory or use multiple FXCHs at the end of the loop.