Hacker News new | ask | show | jobs
Simple, zero overhead way to compress model, KV cache via Low-Rank Decomposition (jeffreywong20.github.io)
1 points by thw20 37 days ago