Since that article was published, things have changed a little bit. At least for the filtering use case they talked about in the blog, XDP (https://www.iovisor.org/technology/xdp) has come about, which is an in-kernel mechanism by which to filter at wire speed.
I don't disagree that there are many examples, but it's hard to solve them unless people give specifics.
Not really a problem for mobile... For HPC there have been user mode and offload tweaks since forever. The fact that none of the hardware or task specific techniques can replace the traditional BSD sockets stack is because writing that in general is a ton of work... Work which G would have to do ^10 to replace Linux in an android context.
For an example among many. Basically network data should magically appear in bulk, whereas in Linux it's almost one packet at a time.
As you move to 10G, thus start to eat a lot of cpu.