| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by physicsgraph 916 days ago
	Thanks for the suggestion. I'm new to running LLMs so I'll take a look at your suggestion [0]. My ~10 year old MacBook Air has 4GB of RAM, so I'm primarily interested in smaller LLMs. [0] https://github.com/ggerganov/llama.cpp

1 comments

akx 916 days ago

You don't necessarily need to fit the model all in memory – llama.cpp supports mmaping the model directly from disk in some cases. Naturally inference speed will be affected.

link