| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by howeman 3192 days ago
	For small vectors and matrices the cgo overhead swamps the assembly speedups. For large vectors cache misses dominate, and the assembly doesn't matter as much. It does matter significantly for medium vectors and large matrices. In that case we provide cgo wrappers and are working on SIMD kernels.