Hacker News new | ask | show | jobs
by YetAnotherNick 843 days ago
My calculation of kv cache gives 1GB per 3000 tokens for fp16. I am surprised openAI competitors haven't done this. This kind of features have not so niche uses, where prefix data could be cached.