And if you can get away with it and don't mind a lot of bit shifting, it's even better working with the Uint32array which packs all 3 colors and alpha into one element which reduces your loops by 4X
Because matching the native 32 word size is better for the prefetcher, right?
Wouldn't most CPU's these days be smart enough to detect advancing the index by 4, and then using offsets?
So:
for(let i = 0; i < someUint8Array.length; i += 4){
let R = someUint8Array[i], G = someUint8Array[i+1],
B = someUint8Array[i+2], A = someUint8Array[i+3];
// ... manipulations here
}
I'm honestly not sure why it is, but across all browsers on both ARM and x86_64 arches it was almost 3x faster than doing what you wrote.
I have a feeling it's more of a JS JIT thing than a CPU prefetcher thing, but honestly I'm not really sure.
In my program I linked above, it was actually faster to use Uint32array everywhere and then use functions to pull the 4 color values from it and another function to push the 4 values back to a uint32.
Granted, it's been over a year since I last benchmarked that code, but I did reuse some of the image code recently and found iterating over a Uint32array to be significantly faster. (And funnily enough, manually unrolling the loop of Uint32array to something similar to what you wrote gave an additional small performance boost, but it was small enough to be not worth the extra weirdness in the code to me)
If it helps, I took the class I made that converts between 4 Uint8Clamped values to a Uint32 value and vice versa into an NPM package at [0].
At the very least it can show you some of the gotchas with bit shifting in JS (like how values often look negative until they are placed into a Uint32array then they become positive integers, and how you need to check for endianness)
Wouldn't most CPU's these days be smart enough to detect advancing the index by 4, and then using offsets?
So: