It's not the most well documented but it's the smallest implementation while still being one of the most performant so you can learn more than just SSE.