Y
Hacker News
new
|
ask
|
show
|
jobs
LLM.int8(): 8-Bit Matrix Multiplication for Transformers at Scale
(
arxiv.org
)
7 points
by
ofirpress
1398 days ago
1 comments
ofirpress
1398 days ago
Cool new efficient inference method that saves 2x memory and does not degrade performance for large language models!
More from the author about this at:
https://twitter.com/Tim_Dettmers/status/1559892888326049792
link
More from the author about this at: https://twitter.com/Tim_Dettmers/status/1559892888326049792