| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by krychu 1100 days ago

Self-plug. Here’s a fork of the original llama 2 code adapted to run on the CPU or MPS (M1/M2 GPU) if available:

It runs with the original weights, and gets you to ~4 tokens/sec on MacBook Pro M1 with the 7B model.