| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by akx 916 days ago
	You don't necessarily need to fit the model all in memory – llama.cpp supports mmaping the model directly from disk in some cases. Naturally inference speed will be affected.