Y
Hacker News
new
|
ask
|
show
|
jobs
by
ofirpress
1398 days ago
Cool new efficient inference method that saves 2x memory and does not degrade performance for large language models!
More from the author about this at:
https://twitter.com/Tim_Dettmers/status/1559892888326049792