By using SIMD instructions like AVX2 you can do 32 characters in parallel. With AVX512, 64.
Pretty easy to write something way faster by not using a LUT.