Each GPU costs ~50k. You need at least 8 of them to run mid-sized models. Then you need a server to plug those GPUs into. That's not commodity hardware.
more like ~$16k for 16 3090s. AMD chips can also run these models. The parts are expensive but there is a competitive market in processors that can do LLM inference. Less so in training.
I don't know where did you get that price from but 1x RTX 3090 is $1,900. 16x is ~$30,000.
> The parts are expensive
Now that we invested ~$30k in GPUs, we only need to find a motherboard that can accommodate 16x pcie4 x16 GPUs, right? And we also need a CPU that can drive that many pcie4 x16 lanes?
Well, none of them exist, not even in the server parts sector let alone client commodity hardware. In any case, you'd need two CPUs so even with this imaginary motherboard we are already entering the server rack design space. And that costs 100's of thousands of $$$.
> but there is a competitive market in processors that can do LLM inference
Nothing but the smallest and smallish models. If that existed then why would you set yourself out building a 16x RTX 3090 machine?