Hacker News new | ask | show | jobs
by rowanG077 551 days ago
That makes no sense. Inference cost dwarf training cost if you have a succesfull product pretty quickly. Afaik there is no commodity hardware that can run state of the art models like chatgpt-o1.
1 comments

> Afaik there is no commodity hardware that can run state of the art models like chatgpt-o1.

Stack enough GPUs and any of them can run o1. Building a chip to infer LLMs is much easier than building a training chip.

Just because one cost dwarfs another does not mean that this is where the most marginal value from developing a better chip will be, especially if other people are just doing it for you. Google gets a good model, inference providers will be begging to be able to run it on their platform, or to just sell google their chips - and as I said, inference chips are much easier.

Chip level is only a tiny part of the story. Training can happen with a big boy variant of "it works on my machine". Inference require a world wide network of GPUs. Chip level is the last thing you will be worrying about.
Each GPU costs ~50k. You need at least 8 of them to run mid-sized models. Then you need a server to plug those GPUs into. That's not commodity hardware.
more like ~$16k for 16 3090s. AMD chips can also run these models. The parts are expensive but there is a competitive market in processors that can do LLM inference. Less so in training.
> more like ~$16k for 16 3090s

I don't know where did you get that price from but 1x RTX 3090 is $1,900. 16x is ~$30,000.

> The parts are expensive

Now that we invested ~$30k in GPUs, we only need to find a motherboard that can accommodate 16x pcie4 x16 GPUs, right? And we also need a CPU that can drive that many pcie4 x16 lanes?

Well, none of them exist, not even in the server parts sector let alone client commodity hardware. In any case, you'd need two CPUs so even with this imaginary motherboard we are already entering the server rack design space. And that costs 100's of thousands of $$$.

> but there is a competitive market in processors that can do LLM inference

Nothing but the smallest and smallish models. If that existed then why would you set yourself out building a 16x RTX 3090 machine?

Sorry, but you're just spitting out non-sense.