Hacker News new | ask | show | jobs
by dahart 532 days ago
And it’s always reliably optimized out in release builds, I assume?
3 comments

On platforms thar require aligned loads and stores (not x86 nor ARM), a direct pointer cast sometimes uses an aligned load/store where a memcpy uses multiple byte loads/stores, even on a good compiler, since memcpy() doesn't require that the pointers are aligned. This can be mitigated by going through a local variable, but it gets pretty verbose.
Some ARM CPUs do require aligned loads and stores, such as the Cortex-M0+ in a Raspberry Pi Pico.
Sounds like a good place for a macro?
We have memcpy behind a C++ template function that mimics the interface of std::bit_cast.
For MSVC you have to add "/Oi", otherwise it is always a function call at lower optimisation levels. Clang and GCC treat it as an intrinsic always, even in debug builds.
I haven't manually checked every case but it's normally folded into the load or shift or whatever and completely erased