| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by MacsHeadroom 1151 days ago
	The best option right now is SuperCOT 30B, a LoRA for LLaMA trained on Chain-of-Thought coding questions and answers. You will need at least 24GB of VRAM to load the 4bit GPTQ quantized LoRA merged model* locally. https://huggingface.co/tsumeone/llama-30b-supercot-4bit-128g... *The merged model contains the LoRA. Applying the LoRA over a "raw" llama model uses more VRAM and does not allow for full context in 24GB of VRAM.