|
|
|
|
|
by howeman
3192 days ago
|
|
For small vectors and matrices the cgo overhead swamps the assembly speedups. For large vectors cache misses dominate, and the assembly doesn't matter as much. It does matter significantly for medium vectors and large matrices. In that case we provide cgo wrappers and are working on SIMD kernels. |
|