|
|
|
|
|
by squidbeak
106 days ago
|
|
> most people won't ever run local inference because it sucks and is a resource hog most can't afford You have fallen headfirst into the "Not now, so never" fallacy. As if consumer hardware won't get more powerful, or models more economical. |
|
Perhaps. Though we have empirical evidence of how much we can quantize and distillate models to the point of practical uselessness. That sets a bar for how large a local model needs to be for general-use as to compete with the could ones. We are talking in the area of 60GB for GPT-OSS/Qwen3.5, which is what enthusiasts are running on 32GB DDR5 + 24GB VRAM RTX 3090.
> As if consumer hardware won't get more powerful
Now I will let you, with that last fact in hand, plot a chart of how much it's been costing to provision that over the past 2 years and use it to prove me wrong about the affordability of local models.