The obvious issue with both your points is that NVidia's competitors did do as such.
AMD has had workable Linux drivers for many years now and there were numerous alternatives to CUDA pushed.
A common talking point is that CUDA is a formidable moat for Nvidia, but - as someone who has never done AI dev - I'm curious to understand what makes CUDA so sticky. From an outsider perspective it looks like a re-run of DirectX vs. everything else but AI is not like gaming and end users often don't have to run the model themselves. So it seems like the network effects should be less than that for a graphics APIs.
I don't know how it is nowadays but i remember trying CUDA back when GeForce GTX 280 was still a high end GPU. I didn't do anything fancy, i just tried to write a simple raytracer to get a feel of how it'd work.
The experience was incredibly simple: write C like usual but annotate a few C functions with some extra keywords and compile using a custom frontend/preprocessor/whatever-nvcc-was instead of gcc (i was on Linux - and BTW i heavily contest the notion that Nvidia drivers on Linux were "nightmare", they always worked just fine with both performance and features comparable to their Windows counterparts while ATi/AMD had buggy and broken drivers for years). Again, the experience was very simple, i even just copy/pasted a bunch of existing C code i had and it worked.
Later i tried to use OpenCL which was supposedly the open alternative. That one felt way more primitive and low level, like writing shaders without the shading bits.
In a way, as you wrote, it was kinda like DirectX: that is, CUDA was like using OpenGL 1.1 with its convenient and straightforward C API and OpenCL was like using DirectX 3 with its COM infested execute buffer nonsense.
After that i never really used CUDA (or OpenCL for that matter) but it gave me the impression that Nvidia did put way more effort on developer experience.
Nvidia have invested a lot in CUDA, and they have C & Fortran bindings for a lot of scientific stuff, apart from all the DL/Gen AI stuff that's super hot right now.
Like, I started using CUDA (through frameworks) over ten years ago, and basically nobody has come up with anything competitive since then.
This is a significant understatement. For quite some time Jensen has been saying repeatedly that 30% of their R&D spend is on software. With the money-printing machine that is Nvidia if that holds they're going to continue to rocket ahead of competitors in terms of delivering actual solutions.
The "What are you talking about? AMD/Intel runs torch just fine!" crowd clearly haven't seen things like RIVA, Deepstream, Nemo, Triton Inference Server/NIM, etc. Meanwhile AMD (ROCm) still struggles with flash attention...
What these hardware-first (only?) companies like AMD don't seem to understand is that people buy solutions, not GPUs. It just so happens that GPUs are the best way to run these kinds of workloads but if you don't have a wholistic and exhaustive overall ecosystem you end up in single digit market share vs Nvidia at ~90%.
chicken and egg arguments.. good points and not untrue, but look elsewhere in this topic and see extensive anti-trust behavior, questionable license practices, deceptive public statements and deceptive handling of binary blobs. Very much like Intel - excellent tech in certain places, very mob-like business behavior in other places.
"What are you talking about? AMD/Intel runs torch just fine!" refers indirectly to the value of having competition in markets, not jump on the (well-funded,slick) monopoly bandwagon.
Since CUDA 3.0, NVidia has embraced a polyglot stack, with C, C++ and Fortran at the center, and PTX for anyone else.
Followed by changing CUDA memory model to map that of C++11.
Khronos never cared for Fortran, and only designed SPIR, when it became obvious they were too late to the party.
So not only has CUDA first level tooling for C, C++, Fortran, with IDE integration in Visual Studio and Eclipse, graphical GPU debugger with all the goodies of a modern debugger, it also welcomes any compiler toolchain that wants to target PTX.
Java, Haskell, .NET, Julia, Python JITs, .... there are plenty to chose from, without going through "compile to OpenCL C99" alternative.
The real moat of CUDA is that CUDA... works. Simply works out of the box, even on cheapest GPUs. Unless you want some specific high end stuff, everything will work on the cheapest GPU of given generation, with the same base tooling.
And because of that, their OpenCL implementation also works better than others. So there's more tooling not just from nvidia using it, because it. just. works.
Compare this with AMD, whose latest framework is a total mess of "will it work on this GPU?", sometimes needing custom wrangling to enable, etc. etc. and it's effectively supported only on the most expensive compute-only cards.
The difference is not just about APIs; CUDA has a single source file model that is dead easy to use whereas last I checked every competitor still had an outdated manual loading process that adds significant friction.
It is supposed to, yes. I was never able to set it up (admittedly I have not tried in a couple of years since I am not working with GPUs anymore) so I don't know how well it holds up.