Hacker News new | ask | show | jobs
by kragen 662 days ago
no, you're right, you almost always need pipelining to get one instruction per clock cycle

but there are a lot of cpus out there—maybe the majority now—that are pipelined microarchitectures that get one instruction per cycle without much or any prediction. avrs, most of the cortex-m* (all?), most modern implementations of old slow isas like the z80 and 8051, etc. big processors like your laptop cpu and cellphone cpu are of course superscalar, but they are a tiny minority of all processors. even inside the cellphone case, they're outnumbered by in-order scalar microcontrollers

without prediction, of course, you have a pipeline bubble every time you have a branch, so you never quite hit 1 ipc with scalar execution. but it's usually pretty close, and even with prediction, sometimes you miss. and usually if you have branch prediction, you also have a cache, because ain't nobody got time to share one triflin memory bus between instructions and data

so pipelining gets you from, say, 3 clocks per instruction down to 1.2 or so. then prediction gets you from 1.2 down to, say, 1.02