Hacker News new | ask | show | jobs
by cburdick13 988 days ago
The main difference is the GPU part. This is a large difference because the same lazy evaluated template type can be run on the CPU or GPU through what we call an executor. On the CPU it's likely very similar to how xtensor is already using expression trees, but on the GPU it's quite a bit different because if the libraries backing it and optimized kernels.

The syntax of MatX also allows us to do more kernel fusion in the future to improve performance without any changes.