|
|
|
|
|
by refulgentis
930 days ago
|
|
This is extremely misleading. source: been working in local LLMs since 10 months ago. Got my Mac laptop too. I'm bullish too. But we shouldn't breezily dismiss those concerns out of hand. In practice, it's single digit tokens a second on a $4500 laptop for a model with weights half this size (Llama 2 70B Q2 GGUF => 29 GB, Q8 => 36 GB) |
|
2B for the attention head and 5B from each of 2 experts.
It should be able to run slightly faster than a 13B desnse model, in as little as 16GB of RAM with room to spare.