| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by halfwhey 49 days ago
	Might be a dumb question but do you have to read the files in the same order in new sessions to ensure the correct prefix for the cache?

3 comments

weiliddat 49 days ago

Also curious. With tool calls reading/searching different files, possible compacting reading a large codebase / long threads, I can't imagine how you hit 99% cache rate.

link

WatchDog 49 days ago

Yes, you have to use the same session, I guess you could load up a bunch of context, then fork the session into a few different tasks, although I haven't tried it.

link

naaqq 49 days ago

Sorry, I was wrong here. I meant a single long session. And there’s no compression, the 1M context is only half used.

link

gbgarbeb 48 days ago

Then where did 200M come from? 200,000 tokens?

link

naaqq 47 days ago

Not all read tokens are included in the context, many of the tokens are from read cache hits. I hit it many times so it grew to 200M. The number came from the API platform.

link