You said you don't want to make users deal with flow control and hardware details.. does that imply a userspace bypass library which does that stuff for us? Does it look posixy?
Solarflare's OpenOnload or Mellanox's VMA both show up as LD_PRELOADS that overload any traditional socket programming unless you want to code your apps to their API directly.
It looks POSIX-like but uses high-level queues and fixes some issues with epoll. The lack of an atomic data unit and the overhead of the poor epoll interface cost too much to retain for kernel-bypass. Take a look at the paper for more details.