It seems to reach only a little above half the theoretical speed, and scale only up to 32 threads for some reason. Might be a temporary software limitation or something more fundamental.