|
|
|
|
|
by dragandj
3190 days ago
|
|
It's a useful trick, but I was talking about comparison with what MKL and cuBLAS and other Nvidia libraries do. For the float/double/half implementations of "standard" operations that 95% use cases fall into, of which there are hundreds if not thousands of, I haven't seen even a close match. Of course, if you need something special, there are various techniques. My preference for those things is to skip the middleman and code them in CUDA/OpenCL kernels directly... OTOH, I'm interested in machine learning applications rather than physics/engineering. Double is an overkill here, float is the sweet spot, and some techniques use half or even less precision. Quad-precision may just not be the case that I need, but for people who do, I suppose they'd have their ways of solving that. |
|