Hacker News new | ask | show | jobs
by wren6991 743 days ago
Double pumping loses its one-cycle latency when you forward to an operation that is genuinely 512 bits wide, like a shuffle. At that point you have to wait for both halves to be available before dispatching. That's another advantage to going fully 512-bit wide