It's highly likely that a workload that is suitable to run on hundreds of disparate computers with thousands of CPU cores is going to be equally well suited for running on tens of thousands of GPU compute threads.
Not necessarily. GPUs simply aren't optimized around branch-heavy or pointer-chasey code. If that describes the inner loop of your workload, it just doesn't matter how well you can parallelize it at a higher level, CPU cores are going to be better than GPU cores at it.