Hacker News new | ask | show | jobs
by adrian_b 572 days ago
While there is some truth in what you say, it makes seem like writing in the CUDA style is something new and revolutionary invented by NVIDIA, which it is not.

The CUDA style of writing parallel programs is nothing else than the use of the so-called "parrallel do" a.k.a. "parrallel for" program structure, which has been already discussed in 1963. Notable later evolutions of this concept have been present in "Communicating Sequential Processes" by C.A.R. Hoare (1978-08: "arrays of processes"), then in the programming language Occam, which was designed based on what Hoare had described, then in the OpenMP extension of Fortran (1997-10), then in the OpenMP extension of C and C++ (1998-10).

Programming in CUDA does not bring anything new, except that in comparison e.g. with OpenMP some keywords are implicit and others are different, so the equivalence is not immediately obvious.

Programming for CPUs in the much older OpenMP is equivalent with programming in CUDA for GPUs.

The real innovation of NVIDIA has been the high quality of the NVIDIA CUDA compiler and CUDA runtime GPU driver, which are able to distribute the work that must be done on the elements of an array over all the available cores, threads and SIMD lanes, in a manner that is transparent for the programmer, so in many cases the programmer is free to ignore which is the actual structure of the GPU that will run the program.

Previous compilers for OpenMP or for other such programming language extensions for parallel programming have been much less capable to produce efficient parallel programs without being tuned by the programmer for each hardware variant.

2 comments

I'm not sure what you mean. All CUDA code needs to be aware of the programming model the GPU imposes on them, splitting their code manually into threads and warps and blocks and kernels to match. This isn't really transparent at all.
Oh all of this was being done in the 1980s by Lisp* programmers.

I'm not calling it new. I'm just saying that the intrinsics style is much much harder than what Lisp*, DirectX HLSL, CUDA, OpenCL (etc. etc) does.

A specialized SIMD language makes writing SIMD easier compared to intrinsic style. Look at any CUDA code today and compare it to the AVX that is in the above article and it becomes readily apparent.