I work on a C library. Some of the functions I've written, like memmove(), take about 7 picoseconds per byte for sizes that are within the L1 cache, thanks to enhanced rep movsb.
That's a very special case though since it's hardware optimized to work up to a cache line at a time, and not at all related to the syscall cost that was mentioned in the parent comment.
The 5us was the setup time in order to be able to enter the sandbox. A system call is around 1us, but rarely used. So, in general the overhead of using the sandbox is around 5us, as everything else is pure workload.