Could save a couple of cycles per iteration by preloading the shift amounts into several GPRs before entering the loop, instead of initializing them just before use.