As long as std::sort can fit in the instruction cache, who cares? (outside the embedded world obviously) It will always be faster unless you're getting regular instruction cache misses.
Even if each instance of std::sort fits in the instruction cache, it's helping to push some other code elsewhere in the program out of the instruction cache, slowing that code down in the process.
If std::sort pushes something out of icache, it's to make room for inlining swaps and compares. I'm skeptical that the other things bumped from icache should be kept hotter.
In addition to the "unless you sort 10 elements" factor, as stated in a sibling comment -
LLVM + libclang together are already 46MB of code, and they're libraries designed to be statically linked. Sure, they aren't going to fill your drive, but they will take a comparative while to load from disk (especially as part of another application - doesn't matter much on the command line), and it's that much harder to justify including them in an otherwise small downloaded package.