|
|
|
Ask HN: How to run Language Models on your own?
|
|
9 points
by sudhirc
1142 days ago
|
|
As someone who is new to running Language Models, I am struggling to understand the infrastructure needed to run them effectively. I would greatly appreciate any advice you can offer. Could you please help me with the following questions: 1. What are the hardware specifications you would recommend for running Language Models? 2. What are the building options available for Language Models and which one is the easiest to set up? 3. Is it better to rent or buy hardware for running Language Models? 4.What are some cost-saving strategies that have worked for you when running Language Models? |
|
If you want to run Vicuna without quantization you need 25GB of VRAM, which exceeds pretty much all consumer GPUs. Vicuna 4bit GPTQ is decent though I personally notice a quality difference when comparing it to 16bit.
CPU is also an option, you can run pretty much any model that will fit in your RAM, although your performance will obviously suffer. LlamaCPP has gotten very popular.