Hacker News new | ask | show | jobs
by turmeric_root 1158 days ago
More VRAM => larger models. IME it is absolutely worth maxing out VRAM for the significant improvement in quality, especially with LLaMA (though even with a 4090, you won't be able to run the largest 65-billion parameter model even with 4-bit quantization).

That said, I recommend renting a cloud GPU for a few hours and trying the larger models on them before buying a GPU of your own, just to see if the models meet your requirements.

1 comments

But should fit easily on a Apple MBP or Studio with 96GB or 128GB of unified memory.