Its not the pointer casting, its the indirect function call that probably can be cached but it will always be another memory location that the CPU will need to jump to.
If the compare function is inlined into the qsort() body, there wouldn't need to be any function call. I think that was the point of the question above.
The way to answer this question is to look at the assembly code generated by the compiler with various optimization levels and see whether the compiler ever inlines the compare function.