| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by p91paul 149 days ago
	In general, with per minute rate limiting you limit load spikes, and load spikes are what you pay for: they force you to ramp up your capacity, and usually you are then slow to ramp down to avoid paying the ramp up cost too many times. A VM might boot relatively fast, but loading a large model into GPU memory takes time.