You aren't using LLVM IR then? I found that if you are a bit careful, LLVM is perfectly capable to lower IR to vector instructions in ARM/X86 automatically.