LLM.int8(): 8-Bit Matrix Multiplication for Transformers at Scale | HN Mirror

Y	Hacker News new \| ask \| show \| jobs

	LLM.int8(): 8-Bit Matrix Multiplication for Transformers at Scale (arxiv.org)
	7 points by ofirpress 1398 days ago

1 comments

ofirpress 1398 days ago

Cool new efficient inference method that saves 2x memory and does not degrade performance for large language models!

More from the author about this at: https://twitter.com/Tim_Dettmers/status/1559892888326049792