|
|
|
|
|
by TinkersW
695 days ago
|
|
Simplest solution and the one I use is all SIMD related buffers use a custom allocator(actually everything uses it) and it always rounds the allocation size up to the SIMD width. Masked loads kinda suck, they are a tiny bit slower and you now need a mask and you need to compute the mask.. |
|
The one case it can be annoying is passing pointers to constant data to custom-heap-assuming functions - e.g. to get a pointer to [n,n-1,n-2,...,2,1,0] for, say, any n≤64, make a global of [64,63,...,2,1,0] and offset its pointer; but you end up needing to add padding to the global, and this materializes as avoidable binary size increase as the "padding" could just be other constants from anywhere else. Copying the constant to the custom heap would be extra startup time and more memory usage (not sharable between processes).