|
|
|
|
|
by simonw
352 days ago
|
|
I'm having trouble running this on my Mac - I've tried Ollama and llama.cpp llama-server so far, both using GGUFs from Hugging Face, but neither worked. (llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'smollm3') I've managed to run it using Python and transformers with PyTorch in device="cpu" mode but unsurprisingly that's really slow - it took 35s to respond to "say hi"! Anyone had success with this on a Mac yet? I really want to get this running with tool calling, ideally via an OpenAI-compatible serving layer like llama-server. |
|
The easiest would be to install llama.cpp from source: https://github.com/ggml-org/llama.cpp
If you want to avoid it, I added SmolLM3 to MLX-LM as well:
You can run it via `mlx_lm.chat --model "mlx-community/SmolLM3-3B-bf16"`
(requires the latest mlx-lm to be installed)
here's the MLX-lm PR if you're interested: https://github.com/ml-explore/mlx-lm/pull/272
similarly, llama.cpp here: https://github.com/ggml-org/llama.cpp/pull/14581
Let me know if you face any issues!