|
|
|
|
|
by diggan
300 days ago
|
|
> Luminal can run Q8 Llama 3 8B on M-series Macbooks at 15-25 tokens per second. The goal is to become the fastest ML framework for any model on any device. Great that some numbers are provided, but in isolation, I'm not sure what they provide. It would be helpful to also share what tok/s you'd get with llama.cpp or something else on the same hardware, so we can actually understand if it's faster or not :) Also including the prompt processing would be a bonus! |
|
Nonetheless this project looks very cool, and I hope they can continue improving it to the point where it indeed beats human-led optimizations.