Hacker News new | ask | show | jobs
by tmaly 405 days ago
What is the min VRAM needed on the GPU to run this? I did not see that on the github
1 comments

With the current 24b LLM model it's 24 GB. I have no clue how far down you can go with the GPU is using smaller models, you can set the model in server.py. Quite sure 16 GB will work but at some point it will probably fail.