| HN Mirror

I ran them again several times to make sure the results were fair. My previous runs also had a different 30B model loaded in the background that I forgot about.

LM Studio is an easy way to use both mlx and llama.cpp

anemll [0]: ~9.3 tok/sec

mlx [1]: ~50 tok/sec

gguf (llama.cpp b5219) [2]: ~41 tok/sec

[0] https://huggingface.co/anemll/anemll-DeepSeekR1-8B-ctx1024_0...

[1] https://huggingface.co/mlx-community/DeepSeek-R1-Distill-Lla...

[2] (8bit) https://huggingface.co/unsloth/DeepSeek-R1-Distill-Llama-8B-...