Hacker News new | ask | show | jobs
by chessgecko 841 days ago
The problem is that it’s probably often not a lot cheaper. Most of the high end gpus have comparatively little bandwidth over pcie (that you’d need to use to store the context on a nvme for example). The cost there would scale with length too so you wouldn’t necessarily save more in that situation either. I think if you used a small enough gqa ratio and you knew for sure you would reuse the weights it could work, but my suspicion is that in general it would just be cheaper to recalculate.