A lot of that footage is originated on smartphones, GoPro, drone cameras, etc. where hardware and power limitations do not allow one to run expensive encoding algorithms.
So the smartphone does a crummy H.264 encode that's bit-expensive but power-efficient, then the YouTube server saves the bits a million times over by transcoding to AV1. There's still room for at least 2 codecs on the Pareto curve.