| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jbandela1 1144 days ago
	I think llama.cpp might be easier to set up and get running. https://github.com/ggerganov/llama.cpp

3 comments

loudmax 1144 days ago

I second this recommendation to start with llama.cpp. It can run on a regular laptop and it gives a sense of what's possible.

If you want access to a serious GPU or TPU, then the sensible solution is to rent one in the cloud. If you just want to run smaller versions of these models, you can achieve impressive results at home on consumer grade gaming hardware.

The FastChat framework supports the Vicuna LLM, along with several others: https://github.com/lm-sys/FastChat

The Oobabooga web interface aims to become the standard interface for chat models: https://github.com/oobabooga/text-generation-webui

I don't see any indication that OpenLLaMa will run on either of those without modification. But one of those, or some other framework may emerge as a de-facto standard for running these models.

link

JLCarveth 1144 days ago

Yes, I can clone this and get into a prompt in less than 5 minutes on an M2 MBA.

link

quickthrower2 1144 days ago

might try it first. seems to be only CPU?

link

azeirah 1144 days ago

It has partial gpu acceleration if you compile it with LLAMA_CUBLAS or LLAMA_CLBLAST

They really have come a long way since... A few weeks ago.

Using cublas with my 1080ti results in a 52% speedup compared to cpu-only. Vram usage is very minimal.

link

themulticaster 1144 days ago

I'd see that as a benefit of llama.cpp - it's specifically designed to be usable on consumer hardware such as laptops, without professional GPUs.

link