Hacker News new | ask | show | jobs
by gcp 3404 days ago
OpenBLAS performance is atrocious in 32 bit mode because it doesn't properly support AVX with the halved register file. Not the most common configuration, but MKL handles it fine (on Intel chips, obviously).

That said I agree it makes more sense for AMD to contribute to OpenBLAS than anything else.

1 comments

Interesting -- what are the use cases for single precision BLAS on CPU? All the scientific software I use requires double precision and for tasks that do well with single precision, I would have thought that GPGPU would now be the go-to solution.
Not if you're shipping software to consumers. (Also, I actually meant 32-bit as in the OS, not the floating point precision)