Reads like a LOT of tokens to me. What does your usage /workflow look like? I'm v curious because although I do use Claude code, my token counts aren't nearly as much
Simply Ruby on Rails. I maintain 3 markdown documents with system design, implementation plan and use cases in repo. Then tell it those files exist and go implement X feature (from implementation plan). These documents, plus AGENTS.md, declare completion criteria, which includes full code coverage both with system and controller tests.
Usually I don't tell it to implement something adhoc, I first implement it in the documents first. LLMs are quite good to keep those documents in sync.
A good part of the implementation plan is that it keeps the LLM on track. With it, the LLM can understand why something must not be done yet, so it includes less unsolicited functionality. My workflow surely can be improved, but it has worked well for me.
In not sure about the actual costs, because I started using the same subscription for document parsing. But even then, I used less than $10 in may.
Not OP, but I routinely load 150k tokens into context. A full sub-package to work on, select other files in the monorepo, e.g. front-end visualization and back-end data loader. Then work some 150k tokens, then start again.
At the end, cache hit rate is like 99.5% if Novita is not having issues.
For official DeepSeek API, 99.9% or something.
Custom harness that never compacts or otherwise doctors the history.
Usually I don't tell it to implement something adhoc, I first implement it in the documents first. LLMs are quite good to keep those documents in sync.
A good part of the implementation plan is that it keeps the LLM on track. With it, the LLM can understand why something must not be done yet, so it includes less unsolicited functionality. My workflow surely can be improved, but it has worked well for me.
In not sure about the actual costs, because I started using the same subscription for document parsing. But even then, I used less than $10 in may.