Hacker News new | ask | show | jobs
by tombert 2510 days ago
This seems pretty cool, and I'll probably play with this at some point, but sadly literally all of my GPUs are AMD or Intel at this point.

I'm sure you had a good reason, so I'm genuinely curious to why CUDA was chosen instead of something like OpenCL?

(I'll add my typical disclaimer that I'm not saying this as some passive-aggressive way to criticize; I'm genuinely curious to the reasoning behind the choice.)

2 comments

Thats a great question. The answer is two-fold.

Early on when we first started playing around with General Processing on GPU's we had Nvidia cards to begin with and I started looking at the apis that were available to me.

The CUDA ones were easier for me to get started, had tons of learning content that Nvidia provided, and were more performant on the cards that I had at the time compared to other options. So we built up lots of expertise in this specific way of coding for GPUS. We also found time and time again that it was faster than opencl for what we were trying to do and the hardware available to us on cloud providers was Nvidia GPUs.

The second answer to this question is that blazingsql is part of a greater ecosystem. rapids.ai and the largest contributor by far is Nvidia. We are really happy to be working with their developers to grow this eco system and that means that the technology will probably be CUDA only unless we somehow program "backends" like they did with thrust but that would be eons away from now.

> We also found time and time again that it was faster than opencl for what we were trying to do and the hardware available to us on cloud providers was Nvidia GPUs.

Were some benchmarks done perhaps or could you provide some more low-level reasons as to why CUDA was more performant? I'm not experienced with CUDA, just generally interested.

I also have to say that I am a bit skeptical of Nvidia as I have never received any proper support for Linux development on Nvidia GPUs for drivers and generally tracking bugs on their cards. It was so frustrating that I just switched to AMD GPUs that "just worked". How is this different for these kinds of use cases? Does Nvidia only care about their potential enterprise customers but they don't care about general usage of their GPUs on Linux? It seems to rub me the wrong way and I don't understand.

Nvidia loves and cherishes you (I think I don't work there). They want you to be able to do this on your laptop, your server, your super computer.

If it has been a few years I would encourage you to get your feet wet again because support has gotten alot better. It's not like 5 years ago when it was nigh impossible to get the driver installed and weird conflicts would come up. I generally recommend using the debian installer if that works for you. Rapids is meant to make data science at scale accessible to people. If you have trouble with CUDA drop by the https://rapids-goai.slack.com . There are many people there that are willing to help.

Do you use Nvidia products on Linux? Reading "love" and "Nvidia" in the same sentence feels a little bit odd because the general sentiment for Nvidia on the Linux community is "don't touch it with a 10 foot pole". If I remember correctly Torvalds himself named it the worst hardware company they had to deal with.
I'm not sure what you're talking about. Besides games, using CUDA on Linux has been the de facto OS for anything serious for almost as long as CUDA has existed. What exactly is the problem with it?
I think this sentiment exists solely among people that don’t actually own any NVIDIA hardware. I‘ve never had any problems with their drivers, any crashes in video games can be usually be attributed to be at least in part the Games fault. In contrast to Windows Linux has abysmal support for restarting crashed video drivers.
Linus Torvalds's kernel developer point of view might be very different from the majority of users'. For the end users, they just need to install Nvidia's proprietary drivers and everything just works.

For a long time, Nvidia was the best option for 3D graphics on Linux. ATI/AMD had terrible drivers (fglrx/Catalyst), Intel had abysmal performance.

>For the end users, they just need to install Nvidia's proprietary drivers and everything just works.

And that's the crux of the issue. proprietary drivers.

We exclusively do Nvidia/Linux.

With nvidia-docker (multi-year effort at this point) and AMIs, esp. the era of ML, this is a non-issue for 80% of our users. The other 20% struggle even without the GPUs. ML is a thing and GPUs run it, so the community has come together here.

Linux laptops remain a mess in general tho, which is annoying for non-cloud dev =/

> blazingsql is part of a greater ecosystem

But now blazingsql is part of an ecosystem within a walled garden fully dependent upon the stability of a single company.

Well it pretty much always was a part of the eco system it just was not open source. We have been contributors to rapids for a while. And yes, we are betting on Nvidia for sure.

For most people building GP GPU solutions they are going to have to make a decision when it comes to which hardware they want to support. After that decision is made it really isn't something you can revisit without copious amounts of money.

So, the part that confuses me with this argument is we live in an Intel world where they have 98% market share in servers. So we're already at the whim of a single company. Why not challenge that dominance?
Not the same. Two companies make x86 processors, and in the very specific case of this article/comment thread, more than one company supports OpenCL. Nvidia/cuda is a one-pony show, no matter how you look at it.
Thanks for answering my question so quickly!

That seems like a pretty good reason...I have been looking to learn some GPU programming to optimize some matrix math that I've been doing for a pet project, and while my first instinct was telling me OpenCL since it's portable, if people who actually know what they're talking about are saying that CUDA is simpler to start with, it might be worth it to me to pick up a cheap Nvidia GPU/Jetson Nano and do some processing that way.

The collab link below let's you use a gpu for free on Google cloud
> OpenCL since it's portable

Even if you choose OpenCL, the tools (profiler, debugger, etc) are usually platform specific. In addition, my experience with opencl across platforms was that each of the vendors' compilers had distinct issues and that performance was not portable.

I get the appeal for an open API, but opencl never grew a development ecosystem or any libraries. IMO it is dying and isn't worth the effort. AMD is implementing CUDA with hip - maybe roll with that.

You definitely do not want to use opencl for matrix multiplies on Nvidia cards. That's the most highly optimized task on GPUs, so much so that they have dedicated hardware units for it. Opencl cannot take advantage of those.
CUDA has two APIs:

1. The runtime api (libcudart.so)

2. The driver api (libcuda.so).

The driver api is very close to the opencl api and is very low level. Most people use the CUDA runtime api which is vastly more convenient. The main difficulty with OpenCl and the driver api is that you have to manually load GPU code onto the device which then returns a handle. You generally have to load the code onto every device which means multiple handles for the same function. This makes executing kernel quiet a lot of work. The runtime api does this all automatically which make programming with CUDA quiet easy since launching a kernel is basically a function call. The CUDA rutime also automatically handles context creation which is another time saver.

When I first learned OpenCL I was shocked at how difficult is was to simply write a simple vector add program since there was all this additional code loading, creating contexts, etc. The setup / boiler plate was greater than the actually code itself.

It basically boils down to convenience in my opinion. Couple this with the fact the NVIDIA generally has the most powerful and energy efficient cards and it's no surprise they took the market.

> The driver api is very close to the opencl api and is very low level.

They are only realistically comparable from OpenCL 2.0 onwards. But no NVIDIA card supports anything beyond 1.2, and with that decision they basically killed OpenCL.

FWIW ROCm, which is what AMD has indicated they will be investing in moving forward, doesn't support OpenCL 2.0 either.
The open conccurent to the runtime api is SYCL.
Except, how many cards are shipping production quality SYSCL drivers, or provide GPGPU SYSCL graphical debuggers?
https://www.codeplay.com/products/computesuite/computecpp Enable SYCL for all openCL devices so Intel, AMD, Nvidia, FPGAs, a lot of things and smartphones which is order of magnitude more devices than CUDA.Products targeting only nvidia devices are mostly niche markets which is pathetic. As for debuggers codexl has been extended to support it.

Except SYCL there's Open{MP/ACC} gpu offloading which become viable and portable. There's also HIP/rocm which transpile to openCL AND CUDA (best of both worlds?) And can transpile CUDA to HIP almost totally automatically. That's how AMD ported tensorflow to openCL.

Smartphones? Which ones?

iOS with its ageing OpenCL drivers, or the new Metal Shader drivers?

Or Android, which Google rather uses their own languages, Renderscript and Halide?

Yes some OEMs do happen to ship non standard Android drivers that also support OpenCL, which require vendor specific SDK to be actually usable, thus not an option versus Renderscript or Halide.

Do you happen to actually know CodePlay? They got their name creating compilers with vectorization optimization for the PS3 and other game consoles.

Their ComputeCpp is a pivot into the GPGPU world and their aren't doing the community edition just from the kindness of their hearts, rather as path into their products.

"If you want to do things with this release, be prepared to be a pioneer. This release is pre-conformance, which means that we do not implement 100% of the SYCL specification. We currently only support Linux and two OpenCL implementations, by Intel and AMD, but wider support is coming. You may find that some unsupported implementations of OpenCL work with ComputeCpp. That's great, but we don't officially support anything else (yet). Most of the open-source libraries being ported to SYCL are not completed yet. This means that you should only check out some of these projects if you want to do some development yourself. We are building a big vision here: large, complex software highly accelerated on a wide range of processors, entirely by open standards. So, please be patient, or work with us."

Feels like it still needs to mature a little bit.

Even Intel, despite their SYSCL contributions to clang (experimental release last 31st July), has been developing in parallel their own extensions, Data Parallel C++, that no one knows in what form will they contribute back to Khronos, if at all.

Meanwhile CUDA has been developed to be language agnostic from the get go, with out of the box support for C, C++, Fortran. Now with Julia, Haskell, Java, .NET support as well.

While Khronos kept banging the C is good enough message until it was too late for vendors to actually care about SPIR-V.

Have you compared performance between your suggested solutions and what can be achieved using hardware vendor platforms? If not then whats kind of pathetic is how quickly you dismiss the people above who say the HAVE done this before.

If you have seen something we have not when it comes to performance then please by all means share it so we can learn!