I wrote up a reddit post for a possible workaround for removing the overhead. It's standard C++, no ABI break is required. It's not without caveats though: https://www.reddit.com/r/cpp/comments/do8l2p/working_around_...