That, in general, is not the behavior I've observed on native hardware. Without getting into hard numbers, I routinely see multiple gigabits/second/core.
With which options? Standard tap (or tun?), or macvtap? The last test I did (macvtap to an external interface but none of the offload options), a gigabit used a lot of CPU, like around a whole CPU just copying data.