| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by chatmasta 2 hours ago
	Just a hunch, but I think the most cost effective “local” deployment method right now is renting GPU clusters by the hour and running all the inference software on them yourself. This will be cheaper than capital expenditure on hardware that will depreciate and become last-gen, and cheaper than OpenRouter pay per token.