|
|
|
|
|
by brucethemoose2
1086 days ago
|
|
I can easily run LLaMA 13B on my 6GB VRAM/16GB RAM laptop using llama.cpp (specifically Kobold.cpp as the frontend). I can barely run 33B, but anything more than 800 context and I oom. But it would run very comfortably on a bigger GPU or a 24GB+ laptop. Theoretically some phones can comfortably handle 13B on mlc-llm though in practice its not really implemented yet. |
|