Also curious. With tool calls reading/searching different files, possible compacting reading a large codebase / long threads, I can't imagine how you hit 99% cache rate.
Yes, you have to use the same session, I guess you could load up a bunch of context, then fork the session into a few different tasks, although I haven't tried it.
Not all read tokens are included in the context, many of the tokens are from read cache hits. I hit it many times so it grew to 200M. The number came from the API platform.