Sadly if you use the latest stream cipher that crypto people are hot on, and will often be the default in TLS 1.3, you’ll get no benefit :( ARM64 only has instructions for AES.
ChaCha can be accelerated with generic SIMD instructions on any arch that supports them. Personally I think that's preferable to algorithm-specific instructions like AES-NI.
Yeah, that's what using hardware specific instructions will get you. Haven't read the changelog, assuming so based on the fact is ARM and the improvement yield.
Can someone explain to me why there is an inherent contradiction in that hardware/software keep trying to be faster in crypto computation, while crypto algorithms (like argon2?) keep trying to be harder and harder to run fast?
Password hashing algorithms like argon2/scrypt/bcrypt are a special case. The security of a hashed password depends on how much work an attacker has to do until the correct password is found.
Suppose an attacker has to try 1,000,000 passwords until the correct one is found. If the password hashing algorithm is fast, so the attacker can try 1,000 guesses per second, it will take the attacker less than an hour. If the password hashing algorithm is slow, so the attacker takes a whole second to try each guess, it will take the attacker many days. With a complex enough password, and a slow enough password hashing algorithm, the attacker will die of old age before the correct password is found.
But that's not the case for other classes of crypto algorithms. For symmetric encryption algorithms like AES or ChaCha20, hashing algorithms like SHA-2 or Blake2, elliptic curve algorithms like secp256r1 or curve25519, and so on, being faster is a good thing.
To be more exact, what you want is not a slow password hashing algorithm. What you want is an algorithm that is as efficient for the defender as it is for the attacker. The key is that you do not want an attacker to be able to abuse custom hardware or distributed compute in order to gain an efficiency advantage.