| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by naasking 67 days ago
	I'm curious if frontier labs use any forms of compression on their models to improve performance. The small % drop of Q8 or FP8 would still put it ahead of Opus, but should double token throughput. Maybe then interactive use would feel like an improvement.