Hacker News new | ask | show | jobs
Q8 KV cache lets a 30B model fit 100K context on a 24 GB RTX 5090 (buraak.com)
3 points by bozdemir 48 days ago