I agree.
We don't need a nondeterministic 10quadrillion vector model.
We need an deterministic expert on our narrow business.
Something small, that can be run on the 2026 version of the spare PC under the CTOs desk.
It will always be worse compared to a centralized approach where hardware utilization can remain high. Except in case which demand low latency which most development things do not need. It's okay if it takes an extra 100ms for a code review to take place.
It's little different from the mainframe/mini computer to the PC: Huge servers and resources will be better--no dispute there--but good-enough will be achieved using local resources.
There are privacy and general de-centralization reasons to prefer this outcome, even though most AI and cloud-first tech companies don't want this.
How long it will take us to get this point is a different matter.
By that logic shouldn’t streaming video games also be centralized?
Yet startups keep trying it and failing. Turns out users actually want exclusive access to that hardware to have a smooth experience. The tradeoff has always been between faster exclusive hardware or slower but cheaper shared hardware.
If local hardware can’t beat shared hardware on performance then something’s wrong? Either it’s because the providers are charging wildly below cost or because local hardware just hasn’t needed to catch up. Maybe it’s both.