Looks like clang does the optimization for -mno-strict-align, but gcc doesn't seem to support that fully yet: https://godbolt.org/z/xn8erKMTe
Edit: Apparently gcc assumes by default, that unaligned loads aren't supported, but with -mtune=size it somehow enables it: https://godbolt.org/z/d9P19aMnn