|
|
|
|
|
by consteval
608 days ago
|
|
While this is true, the most effective optimizations you don't do yourself. The compiler or runtime does it. They get the low-hanging fruit. You can further optimize yourself, but unless your design is fundamentally bad, you're gonna be micro-optimizing. gcc -O0 and -O2 has a HUGE performance gain. We don't really have anything to auto-magically do this for models, yet. Compilers are intimately familiar with x86. |
|
Having cache friendly memory access patterns is perhaps the biggest one. Though automatic vectorization is also still not quite there, so in cases where there's a severe bottleneck, doing that manually may still considerably improve performance, if the workload is vectorizable.