|
|
|
|
|
by Terretta
547 days ago
|
|
Compare performance on various Macs here as it gets updated: https://github.com/ggerganov/llama.cpp/discussions/4167 OMM, Llama 3.3 70B runs at ~7 text generation tokens per second on Macbook Pro Max 128GB, while generating GPT-4 feeling text with more in depth responses and fewer bullets. Llama 3.3 70B also doesn't fight the system prompt, it leans in. Consider e.g. LM Studio (0.3.5 or newer) for a Metal (MLX) centered UI, include MLX in your search term when downloading models. Also, do not scrimp on the storage. At 60GB - 100GB per model, it takes a day of experimentation to use 2.5TB of storage in your model cache. And remember to exclude that path from your TimeMachine backups. |
|