With GOMAXPROCS=n*CPU, n is roughly the amount of pre-emptive (vs the built-in cooperative) multitasking that you want going on, with 1 being none. Handled by the OS, of course. Interesting that you noticed a speed up > 1.
I didn't think about that, I'll write that down in my checklist of things to do when testing/benchmarking my projects... like another dimension to take care of when testing. Aside from domain-range, good data, bad data, edge cases... and other parameters - now add to that concurrency scenarios.