|
|
|
|
|
by visarga
47 days ago
|
|
Large LLMs on MacBook produce tokens at an acceptable speed but the problem is reading context. Not incremental reading like when you have a chat session, because they use KV cache, but large size reading, like when you paste a big file. It can take minutes. |
|