| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by rgbrgb 998 days ago
	fwiw I get more like 35-40 tokens/sec on my m1 macbook with a 7B model. That's way faster than I can read or skim. If we can figure out how to focus the expertise in small models, I don't see why it wouldn't be viable for those of us that don't want to share all of our convos with big tech.