| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bbtc3453 102 days ago
	This is impressive. I've been experimenting with Gemini API for a side project and the latency difference between local and cloud inference is something I keep thinking about. How does memory usage scale with the 500B models?