|
|
|
|
|
by solarkraft
249 days ago
|
|
Anecdotally, even medium-sized prompts (a few thousand tokens) on pretty small models (8-2B) have resulted in extremely noticeable slowdowns (vast majority of total processing time) on my M1 Mac, leading me to appreciate the significance of the pre-fill step (and difficulty of processing large contexts locally). |
|