|
|
|
|
|
by anoncareer0212
434 days ago
|
|
Small point of order: bit slower might not set expectations accurately. You noted in a previous post in the same thread[^1] that we'd expect about a 1 minute per 10K tokens(!) prompt processing time with the smaller model. I agree, and contribute to llama.cpp. If anything, that is quite generous. [^1] https://news.ycombinator.com/item?id=43595888 |
|