Tech is evolving too quickly; every year the hardware will be much more powerful at the same price (as LLM optimizations reach hardware), so you’d end up replacing the device frequently.
GPUs and NPUs are gaining optimizations for the transformer architecture. It’s not “GPU is 3x faster this year”, it’s “GPU has gates specifically designed to accelerate your LLM workload”
See for instance [0], which is just starting to appear in commercial parts.
This is continuing; pretty much every low cost SoC maker is racing to build and extend ML optimizations.