|
|
|
|
|
by derf_
303 days ago
|
|
One time I spent a week carefully rewriting all of the SIMD asm in libtheora, really pulling out all of the stops to go after every last cycle [0], and managed to squeeze out 1% faster total decoder performance. Then I spent a day reorganizing some structs in the C code and got 7%. I think about that a lot when I decide what optimizations to go after. [0] https://gitlab.xiph.org/xiph/theora/-/blob/main/lib/x86/mmxl... is an example of what we are talking about here. |
|