Hacker News new | ask | show | jobs
by gpderetta 1926 days ago
I think you are right that mull/h are fused. I think that M1 has 128 ALUs for the vector unit, so it would be a good way to make use of them. M1 is far from the first iteration of the architecture and Apple has likely picked most if not all low hanging fruits. It also helps x86 emulation I guess.

edit: but see the comment else thread about the loop iteration time being off by a factor of 2.

1 comments

Oh yeah, I thought the add r,r,2 was odd but didn't investigate. This brings things back to ~2+ cycles per iteration, which strictly speaking does not require fusion.

It would be easier to test this explicitly instead of inside some unrelated RNG.