CPUs target low latency (they switch often).
GPUs target high troughput (they switch rarely, only when needed).
High troughput algorithms dont have problem with a lot of threads.
Low latency algorithms have problem with a lot of threads (they need lot of cache memory because of constant switching).
Just like HT.