It’s a lot more work to push data to a GPU or NPU than to just to a couple vector ops. Crypto is important enough many architectures have hardware accelerators just for that.
For servers no, but we’re talking about endpoints here. Also this isn’t only about reducing the existing vector bandwidth but also about not increasing it outside of dedicated co-processors.