Hacker News new | ask | show | jobs
by jbandela1 1144 days ago
I think llama.cpp might be easier to set up and get running.

https://github.com/ggerganov/llama.cpp

3 comments

I second this recommendation to start with llama.cpp. It can run on a regular laptop and it gives a sense of what's possible.

If you want access to a serious GPU or TPU, then the sensible solution is to rent one in the cloud. If you just want to run smaller versions of these models, you can achieve impressive results at home on consumer grade gaming hardware.

The FastChat framework supports the Vicuna LLM, along with several others: https://github.com/lm-sys/FastChat

The Oobabooga web interface aims to become the standard interface for chat models: https://github.com/oobabooga/text-generation-webui

I don't see any indication that OpenLLaMa will run on either of those without modification. But one of those, or some other framework may emerge as a de-facto standard for running these models.

Yes, I can clone this and get into a prompt in less than 5 minutes on an M2 MBA.
might try it first. seems to be only CPU?
It has partial gpu acceleration if you compile it with LLAMA_CUBLAS or LLAMA_CLBLAST

They really have come a long way since... A few weeks ago.

Using cublas with my 1080ti results in a 52% speedup compared to cpu-only. Vram usage is very minimal.

I'd see that as a benefit of llama.cpp - it's specifically designed to be usable on consumer hardware such as laptops, without professional GPUs.