Hacker News new | ask | show | jobs
LLM.int8(): 8-Bit Matrix Multiplication for Transformers at Scale (arxiv.org)
7 points by ofirpress 1398 days ago
1 comments

Cool new efficient inference method that saves 2x memory and does not degrade performance for large language models!

More from the author about this at: https://twitter.com/Tim_Dettmers/status/1559892888326049792