Frankly, ARM server chips have always SEEMED like they held a lot of promise but never delivered on this. If you want power efficiency per-thread, I struggle to see the value of an ARM chip vs a voltage-optimized x64 CPU. It definitely seems like there's an unaddressed market here, but I also feel like if Intel/AMD felt that Amazon Gravity or Alibaba were starting to impact their market share they'd just release TDP-binned server chips.
What kind of cost though? Actual silicon/chip cost doesn't matter because it's amortized over so much usage. Energy efficiency matters, but more as it affects achievable density. For most server applications latency is king, which is why servers use big x64 chips with all power saving features disabled. If you have an application that isn't latency sensitive, it might make more sense to just use a big Xeon or EPYC CPU that is running in low voltage mode.
CPUs don't have the same load profile as GPUs. It could be 256, 512!, 1024!! cores--unless Alibaba has one hell of an innovation on memory bandwidth & consistency, nobody at the big shops is going to care.