Hacker News new | ask | show | jobs
by e12e 1059 days ago
This looks like a great project. Given the costs, I imagine many might want to run on dedicated hardware with GPU - yet:

> GPT4All: When you run locally, RAGstack will download and deploy Nomic AI's gpt4all model, which runs on consumer CPUs.

> Falcon-7b: On the cloud, RAGstack deploys Technology Innovation Institute's falcon-7b model onto a GPU-enabled GKE cluster.

> LLama 2: On the cloud, RAGstack can also deploy the 7B paramter version of Meta's Llama 2 model onto a GPU-enabled GKE cluster.

Why not llama2 on dedicated/local hardware? Memory and download size requirements?

Ed: After reading the linked tutorial - it looks like the built docker container will run fine on local/dedicated hardware?

https://www.psychic.dev/post/how-to-deploy-llama-2-to-google...

1 comments

Yep the docker containers should run fine on local hardware, but the terraform config only supports GCP right now.

In terms of cost - just ran our deployed cluster through GCP's pricing calculator and it's about $300 USD per month. Definitely not cheap for individual use, but pretty affordable for enterprise use. Running the 40B parameter version will be significantly more.

Out of curiosity how does that gcp instance compare to my modest gaming rig (Nvidia 3080 24(?)gb ram/Ryzen 7/64gb ram)? (Since I'm paying 0/month for it ...).
What is the capacity for that price?