| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by solarkraft 249 days ago
	Anecdotally, even medium-sized prompts (a few thousand tokens) on pretty small models (8-2B) have resulted in extremely noticeable slowdowns (vast majority of total processing time) on my M1 Mac, leading me to appreciate the significance of the pre-fill step (and difficulty of processing large contexts locally).