Hacker News new | ask | show | jobs
by mordae 24 days ago
Not OP, but I routinely load 150k tokens into context. A full sub-package to work on, select other files in the monorepo, e.g. front-end visualization and back-end data loader. Then work some 150k tokens, then start again.

At the end, cache hit rate is like 99.5% if Novita is not having issues.

For official DeepSeek API, 99.9% or something.

Custom harness that never compacts or otherwise doctors the history.

1 comments

Those numbers make sense to me...120 million input tokens is like 120 sessions of hitting the full context limit, which seems like a lot to me though