Hacker News new | ask | show | jobs
by jeremyjh 58 days ago
This has nothing to do with the cost of storage. Surprisingly, you are not better informed than Anthropic on the subject of serving AI inference models.

A sibling comment explains:

https://news.ycombinator.com/item?id=47886200

1 comments

They don't cache model state to disk. I am proposing they do.
I’m proposing that you should educate yourself on the subject of LLM KV context caching.