You can use Hyperfine [1] instead of time for a nicer CLI benchmark.
I'd also be curious to know is pillow-simd [2] gets the Python performance closer to Go/Rust, and if using Rayon [3] and changing your .iter()'s in your Rust code to .par_iter()'s will yield an improvement there.