| HN Mirror

XLA and Weld do have similar optimizations -- at their core, one of the main things they do is removing inefficiencies like unnecessary scans over data, common subexpressions, etc. across many operators. The speedup in the benchmark you're referring to actually involved some NumPy code too for pre-processing, and the reason Weld outperformed XLA is because Weld could perform those kinds of optimizations across TensorFlow operators and NumPy functions (whereas XLA only optimizes the TensorFlow part of the application).

I also want to mention that this benchmark is from a while back (around 2017 I believe), so its possible improvements in both XLA and Weld will make the numbers look different today :)