Hacker News new | ask | show | jobs
by MacsHeadroom 1151 days ago
The best option right now is SuperCOT 30B, a LoRA for LLaMA trained on Chain-of-Thought coding questions and answers. You will need at least 24GB of VRAM to load the 4bit GPTQ quantized LoRA merged model* locally. https://huggingface.co/tsumeone/llama-30b-supercot-4bit-128g...

*The merged model contains the LoRA. Applying the LoRA over a "raw" llama model uses more VRAM and does not allow for full context in 24GB of VRAM.