The cache may be small, but unless you're talking about something different - graphics memory is fast, and GPU clock speeds are moderate, so cache isn't as critical on a GPU as on a CPU.
The trouble is that you have hundreds more processors. So even if the memory is twice as fast, and each processor twice as slow, memory access is still the dominating factor in efficiency.
You can work round that by being vary careful, arranging things so that processors access memory in sequence, which lets reads be coallesced (you're streaming data from continguous addresses, avoiding the "seek time" of random access). But that only works if all the processors are focussed on the same job.
Now you can say I'm just describing the standard problems with GPU, and I'd agree, but my point is that even in Fermi (which is a huge step forwards in many ways) these will still dominate. And it's hard to see how most software fits into such an approach. Hence my warning that they are not becoming general purpose.
You can work round that by being vary careful, arranging things so that processors access memory in sequence, which lets reads be coallesced (you're streaming data from continguous addresses, avoiding the "seek time" of random access). But that only works if all the processors are focussed on the same job.
Now you can say I'm just describing the standard problems with GPU, and I'd agree, but my point is that even in Fermi (which is a huge step forwards in many ways) these will still dominate. And it's hard to see how most software fits into such an approach. Hence my warning that they are not becoming general purpose.