Y
Hacker News
new
|
ask
|
show
|
jobs
by
xadhominemx
228 days ago
It’s because the model weights and KV cache are stored in SRAM. It’s extremely expensive per token.