I'd like to see the benchmarks, including power draw. I suspect it is similar to soft ChaCha vs hard AES. A ChaCha software implementation can achieve similar speed as the AES hardware at the cost of significantly higher power draw due to pushing the AVX units at near maximum utilization.
I benchmarked hardware AES vs software ChaCha20, and the former showed an overall performance improvement of an end to end QUIC software stack of more than 50%. The pure crypto difference is probably even higher. That's a huge gap - even thought it might totally be possible that the ChaCha20 implementation of Ring is still improvable.
As a result of that, I asked for rustls to default to AES instead of the previous ChaCha20 default [1]
Some implementations (IIRC Firefox is one of them) choose it dynamically: if hardware AES is available, AES is ordered before ChaCha20, otherwise ChaCha20 is ordered first. Last time I looked, at least for Intel the i3 didn't have hardware AES (but the i5 and i7 have it), so it was not uncommon for lower-end hardware (which needs speed the most) to have AES slower than ChaCha20.
It should be noted that chacha20 has insanely comfortable security margin. With a more accurate estimate chacha8 has a security margin similar to that of AES, and chacha20 has 2.5 times more rounds, see if it's worth the cost.