|
|
|
|
|
by andrew-wja
2750 days ago
|
|
Full disclosure: I'm a compiler person who for funding reasons moved into performance of machine learning systems. None of those things should be called compilers. At best, they are scaffolding for peephole optimization. When you can get these crazy speedups just from bolting on an instruction selector, that's a real indicator that a lot of stuff is just waiting to be done. For context, MKL-DNN embeds XBYAK, an optimizing JIT targeting SSE4.2, AVX2, and AVX512. It sees all the dimensions of the tensors, it knows the strides of the kernel, and so on and so forth. So for us to be able to beat it by such a margin just by stepping back and doing something simple at a higher level of abstraction kinda indicates that the approach it's using is running up against some conceptual limits. It's not that MKL-DNN's JIT isn't good -- it's great, and it's a credit to the engineers working on it. But the problem is that the smarts are being applied in the wrong place! |
|