Hacker News new | ask | show | jobs
by why_only_15 1006 days ago
You can keep around the KV cache from previous generations which lowers the cost of prompts significantly.