| HN Mirror

This is what I do too (in my case I don't round up the allocation size and just let loads & stores potentially see the next object (doing tail stores via load+blend+store where needed; only works if multithreaded heap mutation isn't required though)).

The one case it can be annoying is passing pointers to constant data to custom-heap-assuming functions - e.g. to get a pointer to [n,n-1,n-2,...,2,1,0] for, say, any n≤64, make a global of [64,63,...,2,1,0] and offset its pointer; but you end up needing to add padding to the global, and this materializes as avoidable binary size increase as the "padding" could just be other constants from anywhere else. Copying the constant to the custom heap would be extra startup time and more memory usage (not sharable between processes).