| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by GaggiX 1123 days ago
	I was going to search on the internet about it, but then I realized you are the author (and there is nothing online I think). I imagine that the activations are left in FP16 and the weights are converted in FP16 during inference, right? Btw very cool

1 comments

liuliu 1122 days ago

Yes, computes are carried out in FP16 (so there is no compute efficiency gains, might be latency reductions due to memory-bandwidth saving). These savings are not realized yet because no custom kernels introduced yet.

link