|
|
|
|
|
by exDM69
4574 days ago
|
|
> My usual guess would be that you can often hope for a 50% speedup in a tight loop by dropping from C to assembly The problem with inline assembler is that it is almost untouchable by the optimizer. By adding some inline asm, you may inhibit a lot of optimization that could give better perf overall. For this kind of tasks it is often a lot better to use intrinsics (e.g. xmmintrin.h for SSE) or use compiler extensions __attribute__((vector_size(16))) etc. This way you can utilize the CPU features you have available while still allowing the optimizer to do high level optimizations. |
|