Same reason we have OLAP and OLTP systems. Same reason it makes sense for some use cases to set up a Hadoop cluster, and for others to have a single but fast CPU: some tasks can be run in parallel, others can't.
You can calculate the value of each screen pixel without knowing the value of the others. To calculate the n value of a fibonnaci sequence you have to calculate the n-1 and n-2 values first, so it doesn't matter how many cores do you have. Gpus are for the former and CPUs for the latest.
GPUs are made to execute a limited set of the same operations operation on on a huge amount of data in parallel. This is a totally different workload than your usual computer programs, that commonly have a long and complicated series of commands. Thus just translating your program 1:1 to your GPU computer would make it way slower. Your GPU runs probably around ~1.5GHz, a third of your CPUs. The GPU does have a few thousand cores that run in parallel while our consumer CPUs does have at best a dozen (that are more capable than any single GPU core), but most of those are often already idle, since it's a lot of work to make your software take advantage of them. For some tasks it's worth it, and those take advantage of e.g. you GPU.
Circuits don't operate at speed of light, though. People like to say this over and over again, but electron mobility through a circuit is only around 2/3s the speed of light, IIRC. They may have very tiny mass, but they still have mass.
No, it's the electromagnetic field which propagates at around 2/3c in the conductor. The electron drift velocity is on the order of a fraction of a mm/s.
- GPUs don't handle I/O other than writing to video RAM.
- GPUs don't handle interrupts.
- GPUs don't handle branching well.
For the first two, a lot of infrastructure that is part of the chipset and platform would have to be extended to each compute unit of the GPU.
Imagine a database on a GPU. It's not like 1.5K-2k+ cores that are super good at math can read and write your disk or disk array at once.