| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Barathkanna 140 days ago
	Local models help remove token cost uncertainty, but they shift the problem to infrastructure and ops. GPUs, scaling, maintenance, and latency can add up quickly depending on the workload. For many builders it ends up being a tradeoff between predictable infra cost and flexible API usage.