| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by chessgecko 829 days ago
	also just to add, I think the 1.58 bit is mostly faster for inference because training still had to multiply a lot of floating point gradients by integer activations, hold floating point weights/gradients for round, and deal with norms and stuff. could be wrong about that though