FMV is pretty neat but there are scenarios where it's not ideal, so I'm not surprised to see it not used in what is meant to be a lightweight high performance library.
Notably the fact that the dispatching is done at runtime means you are trading off the convenience factor for code size and running extraneous dispatching code in your critical path. Additionally I've anecdotally seen on modern Intel hardware the power heuristics can penalize you for even _speculatively_ running some of the wider instruction sets.
Ah apologies, I only took a cursory look at his dispatching logic. It looks to me like it supports both modes, but you have to hand roll the dispatching logic if you want to use it at runtime (but he provides an example). If you _really_ need the runtime dispatch then yeah I'd agree FMV is probably cleaner.
Notably the fact that the dispatching is done at runtime means you are trading off the convenience factor for code size and running extraneous dispatching code in your critical path. Additionally I've anecdotally seen on modern Intel hardware the power heuristics can penalize you for even _speculatively_ running some of the wider instruction sets.