Hacker News new | ask | show | jobs
by ofirpress 1398 days ago
Cool new efficient inference method that saves 2x memory and does not degrade performance for large language models!

More from the author about this at: https://twitter.com/Tim_Dettmers/status/1559892888326049792