Where is your data? Is it in the CPU cache or is it in the GPU? Computing where your data is, rather than moving your data to where your compute is, can often be the best option.
For small networks it's often a win to stay on chip at least on the power side. But if you do need to go off chip for memory it's hard to beat the memory bandwidth you have on a GPU.