| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by dragandj 3190 days ago

It's a useful trick, but I was talking about comparison with what MKL and cuBLAS and other Nvidia libraries do. For the float/double/half implementations of "standard" operations that 95% use cases fall into, of which there are hundreds if not thousands of, I haven't seen even a close match.

Of course, if you need something special, there are various techniques. My preference for those things is to skip the middleman and code them in CUDA/OpenCL kernels directly...

OTOH, I'm interested in machine learning applications rather than physics/engineering. Double is an overkill here, float is the sweet spot, and some techniques use half or even less precision. Quad-precision may just not be the case that I need, but for people who do, I suppose they'd have their ways of solving that.