Hacker News new | ask | show | jobs
by cburdick13 987 days ago
Hi, besides an (subjectively) easier syntax, the performance should be higher compared to libtorch. Every operator expression (think of it as an arithmetic expression) is evaluated at compile-time and is often fused into a single GPU kernel. This also removes the need for JITing. If there's a specific workflow you're curious about comparing libtorch vs MatX please let us know and we can try it out.
1 comments

Why can’t you keep the calling interfaces of functions as close to the Py libraries as possible, simplifying the transition for everyone? Will that really destroy the performance increase? Common calling interfaces make everything much simpler. Even in this simple example, the calls differ significantly.
We've tried our best to match python as well as we can, or falling back to matlab-style if Python doesn't have it. Many of our unit tests are verified against python, so the conversion is typically very easy. The one thing that python has that makes this much easier is keyword arguments. We've tried to use overloads to mimic this as best as we could.

That being said, in the example on the home page there are notable differences:

1) we have the run() method. The reason is that the expression before the run is lazily evaluated for performance and does not execute anything. Having the run() method allows you to run the same line of code on either a CPU or GPU by changing the argument to run()

2) in MatX memory allocation is explicit. Python does it as-needed, but this causes a performance penalty with allocations and deallocations that are not under your control. Specifically in the FFT example, numPy will allocate an ndarray prior to calling it, but on the same line. In MatX the allocation is (typically) done before the operation so you can control the performance of the hot path of code.

If you have any specific suggestions, we would love to hear it