OK, is there a horrible speed penalty for writing your SIMD in pure assembly functions and then calling those functions? If you're writing assembly anyway, just drop the "inline" part.
Sure, if you're willing to write a large enough chunk that you can eat the cost of not inlining it. If you just write a small leaf function or two, it will probably be a wash or perform worse.