In 2000 Intel had a huge software moat: Microsoft Windows, and the large install base of x86-only software.
Rich webapps hadn't been invented. Smartphones? If you're lucky your flip phone might have a colour screen. If you've got money to burn, you can insert a PCMCIA card into your Compaq iPAQ and try out this new "802.11b" thing. Java was... being Java.
Almost all the software out there - especially if it had a GUI, and a lot of it did - was distributed as binaries that only ran on x86.
So many devs are too young to remember a time before you would expect to just download some open source and compile it for x86/amd64/arm/emscripten/etc and be good to go. In the old days, if you didn't want to write that library code yourself, chances are all your AltaVista search would turn up was a guy selling a header file and a DLL and OCX[0] for $25. If you were lucky!
A vast amount of code was only intended to compile and run on a single OS and architecture (circa 2000, that was usually x86 Win32; Unix was dying and Wintel had taken over the world). If some code needed to be ported to another platform, it was as good as a from-scratch re-write.
[0] in case you wanted to use the thing in Visual Basic, which you very well might.
"had". That's what helped prop up their monopoly but it didn't last. These days if can't run your software on another architecture, like ARM, you can run at least on AMD. AMD can basically run the same software as Intel. This isn't the situation for NVIDIA vs everyone else, so far.
Other than the huge amount of enterprise software which was only supported on Intel, most of the high-end server business below the mainframe level after the mid-90s, and the huge install base of x86 software keeping everyone but AMD out? Even their own Itanium crashed and burned on x86 compatibility.
Then there were software libraries and the Intel C/C++ Compiler that favored Intel. They would place optimized code paths that only ran on Intel hardware in third party software. Intel has stopped doing that in recent years as far as I know (the MKL has Zen specific code paths), but that is a fairly recent change (maybe the past 5 years).
There were also ISA extensions. Even if Intel had trouble competing on existing code, they would often extend the ISA to gain a temporary advantage over their competitors by enabling developers to write more optimal code paths that would run only on Intel’s most recent CPUs. They have done less of that ever since the AVX-512 disaster, but Intel still is the one defining ISA extensions and it historically gained a short term advantage whenever it did.
Interestingly, the situation is somewhat inverted as of late given Intel’s failure to implement the AVX-512 family of extensions in consumer CPUs in a sane way, when AMD succeeded. Intel now is at a disadvantage to AMD because od its own ISA extension. They recently made AVX-10 to try to fix that, but it adds nothing that was not already in AVX-512, so AMD CPUs after Zen 3 would have equivalent code paths from AVX-512, even without implementing AVX-10.
https://www.realworldtech.com/physx87/3/ "For Nvidia, decreasing the baseline CPU performance by using x87 instructions and a single thread makes GPUs look better."
They doubled down that approach with 'GameWorks' crippling performance on non Nvidia GPUs, Nvidia paid studios for including GameWorks in their games.
NVidia has software moat for specialized applications but not for AI, which is responsible for most of their sales now. Almost everyone in AI uses pytorch/jax/triton/flash attention and not CUDA directly. And if Google can support pytorch for their TPU and Apple for their M1 GPU, surely others could.
> NVidia has software moat for specialized applications but not for AI, which is responsible for most of their sales now. Almost everyone in AI uses pytorch/jax/triton/flash attention and not CUDA directly
And what does pytorch et al. use under the hood? cuBLAS and cuDNN, proprietary libraries written by NVidia. That is where most of the heavy lifting is done. If you think that replicating the functionality and performance that these libraries provide is easy, feel free to apply for a job at NVidia or their competitors. It is pretty well paid.
Did you read the last part? Pytorch uses drivers, and drivers exists for Google's TPU and Apple's M1 GPU as well and both works pretty well. I have tested both and it reaches similar MFU as Nvidia.
Maybe on a particular model/dataset but extremely unlikely in general. Again, like another commenter pointed out: if you truly believe it isn't that hard we would love to hire you at Meta ;)
Yes some operations are not supported in MPS/TPU and falls back to slower CPU. But for common architectures like transformers and convnets, it works very well for all the datasets.
I never claimed it was easy. I meant in my opinion it is in the order of 10s of millions dollars of investment, not a trillion dollar CUDA moat that people comment here.
Are M1 GPUs available for data center deployment at scale? Are Google TPUs available outside of Google? Can Amazon or Microsoft or other third parties deploy them?
Anyone that wants off the shelf parts at scale is going to turn to Nvidia.
The Pentium math bug, Puma cablemodems, their shitty cellular modems that are far worse than Qualcomm's, gigabit chipset issues, 2.5GB chipset issues, and now the 13th/14th gen CPUs that destroy themselves.
And we just gave them billions in tax dollars. Failing upwards...
Rich webapps hadn't been invented. Smartphones? If you're lucky your flip phone might have a colour screen. If you've got money to burn, you can insert a PCMCIA card into your Compaq iPAQ and try out this new "802.11b" thing. Java was... being Java.
Almost all the software out there - especially if it had a GUI, and a lot of it did - was distributed as binaries that only ran on x86.