|
|
|
|
|
by halflings
1016 days ago
|
|
I was able to run the 4bit quantized LLAMA2 7B on a 2070 Super, though latency was so-so. I was surprised by how fast it runs on an M2 MBP + llama.cpp; Way way faster than ChatGPT, and that's not even using the Apple neural engine. |
|
It's more than fast enough for my experiments and the laptop doesn't seem to break a sweat.