| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by brucethemoose2 1106 days ago
	Have you tried the most recent cuda offload? A dev claims they are getting 26.2ms/token (38 tokens per second) on 13B with a 4080.