|
|
|
|
|
by polishgladiator
1044 days ago
|
|
> [...] llama.cpp is a fantastic framework to run models locally for the single-user case (batch=1)
> [...] I don't think it would be particularly fair to compare and show that MKML is better at a given perplexity, compression ratio, and speed on GPU for a multi-user case (batch >> 1) Ok so you agree that llama.cpp etc are great for batch==1, right? And I agree their targeted use case is not batch==32 (because who is doing that really?) But if we extended llama.cpp or some other faster batch==1 implementation to support batch==32, why do you suppose it wouldn't still be faster than MKML? It seems to me that if you can do batch==1 faster, you could easily do batch>>1 faster too -- it is just that no one really needed that (yet?) |
|