| HN Mirror

Sure sorry for the delay. With Sparse Matrices, CUDA utilizes the same compressed CSR txt format, and compression form on its chip, that CPUs do. They use the same compression format, but NVIDIA takes some of the parsed squares and just parallelizes the computation on GPUs and saves the zero values in memory to throw in at the end of the computation, which is still a fine way to do the computation, and utilizing GPU performance over CPU performance, but it never redesigned the CSR compression intake format.

OpenCL does something similiar with the null values, but redesigns the entire compression format reading in matrix data and the entire computation of accessing the tiles of data differently based on the format they have for parsing and accessing the data.

I am trying to find a good explanation online, but I can only find what I have in the OpenCL 1.2 Book Chapter 22 on SPMV, where they visualize and explain CUDAs format, and then OpenCLs format, and provide performance comparisons between the two methods using the standard 22 matrix sets provided by the University of Florida.

OpenCLs performance was better on every metric, the smallest increase in performance cutting computation time by one half, the rest were closer to a third. They ran their own CUDA testing, but also provide the whitepaper results from CUDAs testing, and use NVIDIAs reported results as their official comparison.

For an influx of repetitive real time data reading in millions of data points every few seconds, this kind of advantage is not of a negligible significance.

It took a long time to work with but of course now I can reap the benefits every few seconds on large datasets forever so I found it worth diving into the performance specs on this case.

While I cant find an online visual of the designs for CUDA vs OpenCL explained in CH22, the source code for the std SPMV I speak of here is on one of the authors github: "bgaster" is the username. You should be able to download the code working cross platform and read in matrices of your choice, and compare performance on your own if you have datasets you want to look at.

The OpenCL book I highly recommend. It explains how to use opencl and provides comprehensive examples, walking through concepts with code for 22 chapters. The source code for that book is there with the rest of bgasters stuff. It's definitely not a trivial language to take on. The learning curve is steep. However, I learned GPUs through OpenCL, so I don't know if it would be easier coming from CUDA, biases about what to expect because the learner is already reaping the benefits of CUDA aside.

I took time initially to read through the entire 1.2 openCL API over a couple of weeks. It's one of the most comprehensive and detailed API's I've ever read and I find myself dissapointed with documentation of other groups in comparison. Once you are familiar with the scope of functionality available to you, I typically keep the PDF API open and search F for concepts and find whats available to me as I code through things. The most recent API is just as good and can be found here:https://www.khronos.org/registry/OpenCL/specs/opencl-2.2.pdf

Also, one of the authors of OpenCL works fulltime at Apple now, and opencl comes installed and simply working on any OSX operating system. So if you happen to have a mac, it should be relatively easy to download some working source code and try out.

I have OSX, and was also able to download an SDK for my NVIDIA graphics card on an asus zenbook running fedora 24, and got it up and running just fine in an hour or so, for total install and testing.