|
|
|
|
|
by NullCascade
303 days ago
|
|
What is the actual process of identifying hotspots caused suboptimal compiler generated assembly? Would it ever make sense to write handwritten compiler intermediate representation like LLVM IR instead of architecture-specific assembly? |
|
The factors are something like:
- specialization: there's already a decent plain-C implementation of the loop, asm/SIMD versions are added on for specific hardware platforms. And different platforms have different SIMD features, so it's hard to generalize them.
- predictability: users have different compiler versions, so even if there is a good one out there not everyone is going to use it.
- optimization difficulties: C's memory model specifically makes optimization difficult here because video is `char *` and `char *` aliases everything. Also, the two kinds of features compilers add for this (intrinsics and autovectorization) can fight each other and make things worse than nothing.
- taste: you could imagine a better portable language for writing SIMD in, but C isn't it. And on Intel C with intrinsics definitely isn't it, because their stuff was invented by Microsoft, who were famous for having absolutely no aesthetic taste in anything. The assembly is /more/ readable than C would be because it'd all be function calls with names like `_mm_movemask_epi8`.