Don't have a $5k MacBook to run LLAMA65B? MiniLLM runs LLMs on GPUs in <500 LOC

Y	Hacker News new \| ask \| show \| jobs

	Don't have a $5k MacBook to run LLAMA65B? MiniLLM runs LLMs on GPUs in <500 LOC (github.com)
	3 points by volodia 1194 days ago

1 comments

Doesn't this use as much VRAM as llama.cpp (with int4 models) uses RAM? RAM is a lot cheaper than VRAM.

It won't run as fast on your CPU at it will run on a GPU. Also, it might clog most of your RAM; it's better to offload to a cheap GPU.