Also look at BearSSL: https://www.bearssl.org/constanttime.html 2.4GB per second for AES-INI is comparable to my own measurements with AVS-256 Chacha20.
Chacha is slightly faster than Salsa, mostly because it removed some word shuffling Salsa needed for matrix transposition.