Hacker News new | ask | show | jobs
by gj_78 2102 days ago
I really do not understand why a (very good) hardware provider is willing to create/direct/hint custom software for the users.

Isn't this exactly what a GPU firmware is expected to do ? Why do they need to run software in the same memory space as my mail reader ?

3 comments

NVIDIA employs more software engineers than hardware engineers.

> Why do they need to run software in the same memory space as my mail reader ?

It is a lot more expensive to build functionality and fix bugs in silicon than it is to do those same things in software.

At NVIDIA, we do as much as we possible can in software. If a problem or bug can be solved in software instead of hardware, we prefer the software solution, because it has much lower cost and shorter lead times.

Solving a problem in hardware takes 2-4 years minimum, massive validation efforts, and has huge physical material costs and limitations. After it's shipped, we can't "patch" the hardware. Solving a problem in software can sometimes be done by one engineer in a single day. If we make a mistake in software, we can easy deploy a fix.

At NVIDIA we have a status for hardware bugs called "Won't Fix, Fix in Next Chip". This means "yes, there's a problem, but the earliest we can fix it is 2-4 years from now, regardless of how serious it is".

Can you imagine if we had to solve all problems that way? Wait 2-4 years?

On its own, our hardware is not a complete product. You would be unable to use it. It has too many bugs, it doesn't have all of the features, etc. The hardware is nothing without the software, and vice versa.

We do not make hardware. We make platforms, which are a combination of hardware and software. We have a tighter coupling between hardware and software than many other processor manufacturers, which is beneficial for us, because it means we can solve problems in software that other vendors would have to solve in hardware.

> I really do not understand why a (very good) hardware provider is willing to create/direct/hint custom software for the users.

Because we sell software. Our hardware wouldn't do anything for you without the software. If we tried to put everything we do in software into hardware, the die would be the size of your laptop and cost a million dollars each.

You wouldn't buy our hardware if we didn't give you the software that was necessary to use it.

> Isn't this exactly what a GPU firmware is expected to do ?

Firmware is a component of software, but usually has constraints that are much more similar to hardware, e.g. long lead times. In some cases the firmware is "burned in" and can't be changed after release, and then it's very much like hardware.

> Isn't this exactly what a GPU firmware is expected to do?

The source data needs to appear on the GPU somehow. Similarly, the results computed on GPU are often needed for CPU-running code.

GPUs don’t run an OS and are limited. They can’t possibly access file system, and many useful algorithms (like PNG image codec) is a poor fit for them. Technically I think they can access source data directly from system memory, but doing that is inefficient in practice, because GPUs have a special piece of hardware (called copy command queue in d3d12, or transfer queue in Vulcan) to move large blocks of data over PCIe.

That library implements an easier way to integrate CPU and GPU pieces of the program.

What do you mean about running in the same memory space? Your operating system doesn’t allow that. Is your concern about using host memory? This open source library doesn’t automatically use host memory, users of the library can write code that uses host memory, if they choose to.

How would a firmware help me write heterogeneous bits of c++ code that can run on either cpu or gpu?

> What do you mean about running in the same memory space? Your operating system doesn’t allow that. Is your concern about using host memory?

Actually, the basis of our modern GPU compute platform is a technology called Unified Memory, which allows the host and device processor to share access to memory spaces. We think this is the way going forward.

Of course, there's still the process isolation provided by your operating system.

IMHO, the question is not that we need code to run on CPUs and GPUs , we do need that, The question is whether the GPU seller has to control both sides. Until I buy a CPU from nvidia I want to keep some kind of independence.

When will we be able to use a future riscv-64 CPU with an nvidia GPU ? we will let the answer to nvidia ?

> IMHO, the question is not that we need code to run on CPUs and GPUs , we do need that, The question is whether the GPU seller has to control both sides.

The question is not about running code on CPUs, or running code on GPUs. It's about running code on both CPUs and GPUs at the same time. It's about enabling the code on the CPU and the code on the GPU to seamlessly interoperate with each other, communicate with each other, move objects and data to and from each other.

Who do you expect to make that happen?

> Until I buy a CPU from nvidia I want to keep some kind of independence

You can buy a CPU from NVIDIA, check out our Tegra systems. We also sell full systems, like DGX platforms, which use a 3rd party CPU.

> When will we be able to use a future riscv-64 CPU with an nvidia GPU ? we will let the answer to nvidia ?

Who else would answer this question?

Okay, you want to use <insert some future CPU> with our GPU.

Who is going to design and build the interconnect between the CPU and the GPU?

Who is going to provide the GPU driver?

The CPU manufacturer? Why would they do that? They don't make any money from selling NVIDIA products. Why should they invest effort in enabling that?

You can use this library to write code that runs on both risc-v and a GPU! You seem to be pretty confused about what this library is. It’s not exerting any control. It’s open source! It’s strictly optional, and it only allows developers to do something they actually want, to write code that will compile for any type of processor that a modern c++ compiler can target.
Again, I see what you mean. I am even against nvidia advising the developers to use such or such C++ library (be it GNU). It is not their role to do that. We need smarter and more shining GPUs from nvidia, not software.

I would say .... The hardware must be sold independently of the software ... but it is a bit too complex, I know.

I'm not understanding your point at all. You don't think developers should be able to write C++ code for the GPU?

What do you even mean about 'it is not their role to do that.' and 'hardware must be sold independently of the software'?? Why are you saying this? Software interfaces are critical for all GPUs and all CPUs, just ask AMD & Intel. There is no such thing as CPU or GPU hardware independent of software. Plus, the specific library here is being sold independently of the hardware, it is doing exactly what you say you want, it's separate and doesn't require having any other nvidia hardware or software. (I can't think of any good reasons to use it without having some nvidia hardware, but it is technically independent, as you wish.)

> You don't think developers should be able to write C++ code for the GPU?

To be clear, I don't think nvidia-paid developers should be able to write C++ Code for a nvidia-sold GPU. The world will be better if any developer (paid by nivida or not) is able to write code for any GPU (sold by nvidia or not). It is not nvidia role to say how or when software will be written. Their hardware is good and that's more than OK.

AI/CUDA code written specifically for nvidia is useless/deprecated in the long term. A lot of brain waste.

> It is not their role to do that.

You are incorrect.

NVIDIA employs more software engineers than hardware engineers.

> We need smarter and more shining GPUs from nvidia, not software.

Software is a part of the GPU. You get better GPUs by having hardware and software engineers collaborate together.

It is extremely expensive to put features into hardware. It costs a lot of money and takes a very long time. It takes 2-4 years at a minimum to put features into hardware. And there are physical constraints; we only have so many transistors.

If we make a mistake in hardware, how are we supposed to fix it? At NVIDIA we have a status for hardware bugs called "Fix in Next Chip". The "Next Chip" is 2-4 years away.

So what do we do? We solve problems in software whenever possible. It's cheaper to do so, it has a quicker turnaround time, and most importantly, we can make changes after the product has shipped.

> I would say .... The hardware must be sold independently of the software ... but it is a bit too complex, I know.

We don't sell hardware and you don't want to buy hardware. Trust me, you wouldn't know what to do with it. It's full of bugs and complexity.

We sell a platform that consists of hardware and software. The product doesn't work without software.

If we tried to make the same product purely in hardware, the die would be the size of your laptop and would cost a million dollars.