| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by crest 2912 days ago
	They do not. The difference is the same as having a BLAS implementation that uses the SIMD units to their potential and one that supports a dedicated sparse matrix multiply ASIC connected via some high bandwidth bus. In the first case the caller doesn't have to worry about the implementation details it is just a performance optimization. In the other case you have to use an API that has to be more cumbersome to deal with multiplexing the accelerator and moving data.