This is because Intel is tricking you with the core count. I'm guessing you have an i5/i7/whatever with 8 cores but just two memory channels. Since it only takes two or three cores to saturate those memory channels, you will never be able to max out 8 cores on anything that processes much more than $CACHE_SIZE (~4MB) of data. So you can use 8 cores for stuff like finding primes or bruteforcing RC5 (like distributed.net), but not much else.
Depends on what you want and how much money you want to spend. If your workload is embarrasingly parallel, it could be cheaper to buy several servers with a CPU like the Xeon E5-1630 v3 (Edit: which has just 4 cores and 4 mem channels). Or, as you say, go the POWER8/SPARC route, which also includes the recent option of a single x86 server with up to 8x Xeon E7-8893 v3. For the latter option you may hit NUMA issues as well; depends on your workload.
Your choice boils down to the classic "message passing" vs. "shared memory" (think MPI vs. OpenMP) architectural choice. What the optimal solution is depends on the specific application, as well as how far you want to scale it.