| While there is some truth in what you say, it makes seem like writing in the CUDA style is something new and revolutionary invented by NVIDIA, which it is not. The CUDA style of writing parallel programs is nothing else than the use of the so-called "parrallel do" a.k.a. "parrallel for" program structure, which has been already discussed in 1963. Notable later evolutions of this concept have been present in "Communicating Sequential Processes" by C.A.R. Hoare (1978-08: "arrays of processes"), then in the programming language Occam, which was designed based on what Hoare had described, then in the OpenMP extension of Fortran (1997-10), then in the OpenMP extension of C and C++ (1998-10). Programming in CUDA does not bring anything new, except that in comparison e.g. with OpenMP some keywords are implicit and others are different, so the equivalence is not immediately obvious. Programming for CPUs in the much older OpenMP is equivalent with programming in CUDA for GPUs. The real innovation of NVIDIA has been the high quality of the NVIDIA CUDA compiler and CUDA runtime GPU driver, which are able to distribute the work that must be done on the elements of an array over all the available cores, threads and SIMD lanes, in a manner that is transparent for the programmer, so in many cases the programmer is free to ignore which is the actual structure of the GPU that will run the program. Previous compilers for OpenMP or for other such programming language extensions for parallel programming have been much less capable to produce efficient parallel programs without being tuned by the programmer for each hardware variant. |