| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by gcr 60 days ago
	On TheTom’s llama-cpp fork, TurboQuant makes inference about five to ten times slower than vanilla (M1 Max, qwen3.6-35b-a3b). Seems like the productionization is still a ways away. Excited to see what we can get it down to though.