Hacker News new | ask | show | jobs
by crest 2865 days ago
They do not. The difference is the same as having a BLAS implementation that uses the SIMD units to their potential and one that supports a dedicated sparse matrix multiply ASIC connected via some high bandwidth bus. In the first case the caller doesn't have to worry about the implementation details it is just a performance optimization. In the other case you have to use an API that has to be more cumbersome to deal with multiplexing the accelerator and moving data.