|
|
|
|
|
by krychu
1052 days ago
|
|
Self-plug. Here’s a fork of the original llama 2 code adapted to run on the CPU or MPS (M1/M2 GPU) if available: https://github.com/krychu/llama It runs with the original weights, and gets you to ~4 tokens/sec on MacBook Pro M1 with the 7B model. |
|