Which (assuming 1 multiply-add = 2 operations) is the same int8 operations/cycle as the Apple M1's float16 operations/cycle. Intel's 16-bit operations might be the same rate, but I'd guess half? That'll almost certainly be at a higher clock-speed, and one-per-core rather than one-per-four-P-cores. (And I think Apple might have doubled their throughput in M2. As you said, performance comparison is hard.)
Which (assuming 1 multiply-add = 2 operations) is the same int8 operations/cycle as the Apple M1's float16 operations/cycle. Intel's 16-bit operations might be the same rate, but I'd guess half? That'll almost certainly be at a higher clock-speed, and one-per-core rather than one-per-four-P-cores. (And I think Apple might have doubled their throughput in M2. As you said, performance comparison is hard.)