Hacker News new | ask | show | jobs
by solarkraft 249 days ago
Anecdotally, even medium-sized prompts (a few thousand tokens) on pretty small models (8-2B) have resulted in extremely noticeable slowdowns (vast majority of total processing time) on my M1 Mac, leading me to appreciate the significance of the pre-fill step (and difficulty of processing large contexts locally).