|
|
|
|
|
by crest
2865 days ago
|
|
They do not. The difference is the same as having a BLAS implementation that uses the SIMD units to their potential and one that supports a dedicated sparse matrix multiply ASIC connected via some high bandwidth bus. In the first case the caller doesn't have to worry about the implementation details it is just a performance optimization. In the other case you have to use an API that has to be more cumbersome to deal with multiplexing the accelerator and moving data. |
|