|
|
|
|
|
by jmorgan
778 days ago
|
|
I think it's worth it, although it might be best to wait for the next iteration: there's rumors the M4 Macs will support up to 512GB of memory [1]. The current 128GB (e.g. M3 Max) and 192GB (e.g. M2 Ultra) Macs run these large models. For example on the M2 Ultra, the Qwen 110B model, 4-bit quantized, gets almost 10 t/s using Ollama [2] and other tools built with llama.cpp. There's also the benefit of being able to load different models simultaneously which is becoming important for RAG and agent-related workflows. [1] https://www.macrumors.com/2024/04/11/m4-ai-chips-late-2024/
[2] https://ollama.com/library/qwen:110b |
|