|
|
|
|
|
by e12e
1059 days ago
|
|
This looks like a great project. Given the costs, I imagine many might want to run on dedicated hardware with GPU - yet: > GPT4All: When you run locally, RAGstack will download and deploy Nomic AI's gpt4all model, which runs on consumer CPUs. > Falcon-7b: On the cloud, RAGstack deploys Technology Innovation Institute's falcon-7b model onto a GPU-enabled GKE cluster. > LLama 2: On the cloud, RAGstack can also deploy the 7B paramter version of Meta's Llama 2 model onto a GPU-enabled GKE cluster. Why not llama2 on dedicated/local hardware? Memory and download size requirements? Ed: After reading the linked tutorial - it looks like the built docker container will run fine on local/dedicated hardware? https://www.psychic.dev/post/how-to-deploy-llama-2-to-google... |
|
In terms of cost - just ran our deployed cluster through GCP's pricing calculator and it's about $300 USD per month. Definitely not cheap for individual use, but pretty affordable for enterprise use. Running the 40B parameter version will be significantly more.