|
|
|
|
|
by cburdick13
987 days ago
|
|
Hi, having feature parity with cuPy is a daunting task, especially for a C++ library. At this point we feel we have a good foundation for all kinds of basic and advanced tensor manipulations, and have a growing number of library-based functions on top of that. The low-hanging fruit was wrapping the CUDA math libraries, so that has the most progress. Since MatX was originally intended for streaming/real-time processing the focus has been on making C++ for CUDA easier to use for those kinds of applications. There are also a lot of things in cuPy and sciPy that don't make a lot of sense to do on the GPU, like offline tasks such as filter design in signal processing. Also, since C++ users typically are writing in that language for maximum performance, we've put a large focus on making sure we are as close to writing optimized CUDA as possible. In general, most workloads we've tested are about 3-4x faster than the cuPy counterpart due to better fusion and language overhead. We have discussed supporting cusparse as well, but cusparse typically requires a different input type like a CSR matrix. This is not something you'd typically want to detect/convert, so we're still discussing ways of getting this integrated cleanly. We do have several versions of SVD, including one that calls cuSolver. Feedback is always appreciated! |
|