Hacker News new | ask | show | jobs
by igodard 3255 days ago
Mill has no register file, and is essentially a bypass network internally. That bypass network has a source X sink complexity (not N^2 because the number of sources need not match the number of sinks). You are right that the cross product is a limit on Mill scaling. Our internal work gives us some confidence that we can handle 30-wide issue with tolerable clock impact; beyond that is unclear, and indeed we may hit other constraints that preclude going further; memory bandwidth is a likely issue.
2 comments

What makes you think you can extract that much instruction level parallelism in the first place?
One major one is speculation. Each Mill operand, or element in a vector operand, has a 'Not a Result' (NaR) flag.
You still have quadratic complexity of the network, the "need not match" argument can be reduced to a constant multiplier.