Hacker News new | ask | show | jobs
by xadhominemx 228 days ago
It’s because the model weights and KV cache are stored in SRAM. It’s extremely expensive per token.