It is really simple and works with unaligned data.
But it is not doing well in benchmarks. I wonder if I should use another one