Hacker News new | ask | show | jobs
by vortico 1645 days ago
Isn't base64 encoding/decoding bound to memory bandwidth? If so, won't all 4/3 memory formats encode/decode at the same speed?
3 comments

Even when memory bandwidth is the bottleneck (which I’m not sure about), optimizations help.

Due to the nature of CPUs operating asynchronously, if one function is waiting for memory to be read, it can continue doing other things.

As such, if this base64 implementation is more efficient, even though the “wall clock” time is exactly identical due to the memory bandwidth, the CPU has more time to perform _other_ tasks.

Memory bandwidth is a complex beast, one should be able to get 50GB/s for short decodes of single to double digit k on modern hardware. The author measured 11GB/s ish in their memcpy benchmark, but only half that for decode. If memory bandwidth was the wall, then it should be closer to memcpy in perf. I could see fusing base64 decode and json parsing into a single function.

It would be cool to show a demo of a processor maintaining good IPC on one hyper thread while the other hyper thread running on the same core was able to do a base64 decode.

Memory bandwidth is the limit, so you can't make it faster than memcpy(). However, copying memory is still faster than these algorithms, аnd it's complicated to make the conversion speed closer to the speed of copying memory.
What you're saying is that memory bandwidth is a limit but not the current limiting factor. So your answer to GP is "no".
Only if you don't stream it to any more compute intensive process, so you'd need to leave cycles free for that, and you are reading it from a cache cold buffer instead of eg a network buffer which would be already in cache coming out of the net stack.