It's worth noting that the cloudflare test was done on a Xeon Silver, which has worse properties around the frequency changes than the Gold or Platinum. If you're on either Gold or Platinum, you're less likely to suffer the problems that Cloudflare did with mixed workloads.
This seems an optimisation nightmare. Your program needs to be aware both of the capability of the chip for using instructions, and what type of chip it is within a family to decide if you maybe do or don't want to use certain vectored instructions.
I think the reason for reducing clock speed when vector units are in heavy use is to keep power usage in check.
You might also find https://blog.cloudflare.com/on-the-dangers-of-intels-frequen... helpful, which goes into detail about a specific case where dynamic frequency scaling resulted in AVX-512 code running slower than AVX2 code.